MICROCOPY  RESOLUTION  TEST  CHART 

NATIONAL  BUREAU  OF  STANDARDS' 1963 -A 


1 


AD-A151  549 


FINAL  REPORT 

The  Development  of  a  Programming  Support  System 
for  Rapid  Prototyping 
N00014  -  82  -  C  -  0173 
January  1984  -  January  1985 
SO-Ol-85 


Prepared  for 

Office  of  Naval  Research 
Department  of  the  Navy 
800  N.  Quincy  Street 
Arlington,  Virginia  22217 


Software  Options,  Inc. 
22  Hilliard  Street 
Cambridge,  Mass.  02138 
Tel.  (617)  497-5054 


OTIC 

ELECTEjf 
MARI  81985  1 


A. 


This  document  has  been  approved 
fox  public  releas*  and  sale;  its 
distribution  is  unlimited. 


Summary 


<-r— — This  report  describes  the  results  of  work  on  Tasks  2  and 
3  of  this  project,  initially  conceived  to  be  a  five  year  pro¬ 
ject  to  develop  a  programming  support  environment  and  a  collec¬ 
tion  of  tools  that  support  rapid  prototyping.  For  a  broader 


overview  of  the  project,  in  particular,  of  the  support  environ¬ 
ment,  see  the  final  report  for  Task  1  £lj .  The  initial  project 
was  scaled  back,  and  we  report  here  on  narrow  aspects  of  the 


problems  that  played  a  role  in  the  larger  system. 


The  principal  work  in  Task  2  was  the  design  of  a  new 
method  for  code-generation,  particularly  oriented  to  the 
needs  and  capabilities  of  the  programming  support  environment. 
This  resulted  in  a  rather  large,  self-contained  document. 

Code  Generation  by  Coagulation,  which  is  included  in  its 
original  form  as  part  of  this  report. 


Task  3  was  to  have  been  an  effort  to  prototype  some  of 
the  code-generation  ideas  developed  in  Task  2,  in  particular, 
an  analyzer  that  builds  an  intermediate  form  for  bi-directional 
scanning  of  a  program,  a  necessary  constituent  of  the  optimizing 
code-generator.  Task  3  also  called  for  developing  overall 
specifications  for  the  Rulog  language  and  interpreter  and  develop 
ing  a  prototype  of  the  interpreter.  Due  to  the  limitation  of 
funds,  only  a  design,  not  a  prototype,  of  the  bi-directional 


scanner  was  eventually  supported;  this  work  Is  reported  on  In 
the  second  document  included  in  this  report.  The  work  on  Rulog 
is  reported  on  in  a  paper  that  has  been  submitted  to  the  8th 
International  Conference  on  Software  Engineering;  a  copy  of 
that  paper  is  attached. 


Bibliography 

1.  "The  Development  of  a  Programming  Support  System  for 
Rapid  Prototyping  -  Pinal  Report  for  Task  1",  Technical 
Report  SO-Ol-83,  Software  Options,  Inc.,  22  Hilliard 
Street,  Cambridge,  MA  02138. 


Acoeprslon  For 

v-s  f;RA&i 
'  Tab 

t  '..'.ir.t-d 
<  •  >■ 1  AiCi.t  ton. 


ertr* 


Code  Generation  by  Coagulation 


14  December  19S3 


Software  Option.  Inc 
22  Hfittnd  Street 
Cambridge.  MA  02138 


describes  a  new  approach  to  code  grnrntinn  The  central  tenet  is  that 
he  a  ante  coupling  between  sister  allocation  and  instractian 

an  esdrts  in  present-day  technology.  Thh  is  achiered  by  generating  code 
nil  regions  and  gradually  coaleadng  the  part  of  the  program  that  is 


was  sapported  in  part  by  ONK  contract  N0001442-C-0173. 


1 


1.  introduction 


11  Traditional  Aanuaptions 

The  problem  of  designing  a  code  generator  for  an  interpreter-baaed  language  baa 
provided  a  chance  to  re-examine  the  traditional  assumptions  underlying  the  present 
technology  of  code-generation.  Probably  the  moat  pervasive  of  these  assumptions  is 
that  programs  are  compiled  before  they  are  run.  In  an  interpretive  environment,  a 
compiler  seea  a  program  only  after  it  is  mostly  debugged;  changes  to  a  program  are 
usually  tested  with  the  interpreter  before  the  program  is  recompiled.  This 
comparative  infrequency  at  compilation  has  two  consequences,  one  of  which  is 
obvious:  a  compiler  for  such  an  environment  can  run  slower  than  other  compilers, 
since  it  is  used  less  often.  The  less  obvious  implication  of  having  run  programs 
before  they  are  compiled  is  that  one  can  design  a  compiler  whose  optimization 
techniques  are  fundamentally  dependent  upon  execution  statistics  gathered  when 
running  the  program  on  "typical"  data. 

Another  aswmption  underlying  present  compilers  is  that  subroutines  are  generally 
large  and  are  called  infrequently.  While  this  was  less  absurd  twenty  yean  ago, 
language  implementations  continue  to  have  rigid  calling  sequences,  register 
conventions,  etc.  As  programmers  are  being  taught  more  and  more  to  break  down 
their  programs  into  smaller,  more  easily  understood  pieces,  there  is  a  growing 
importance  of  not  punishing  them  with  increased  computing  costs.  A  variety  of 
parameter  and  result  passing  mechanisms,  convenient  register  conventions  between 
caller  and  callee.  cheap  subroutine  calls  to  non-recursive  procedures  (not  to  mention 
direct  substitution  when  pomible)  are  all  required  to  support  good  inter-procedural 
optimization  at  the  machine  code  leveL 

Code  space  optimization  is  an  area  neglected  in  most  compilers,  as  if  space  were 
free.  A  corollary  to  this  is  that  time-space  trace-offs  are  also  neglected,  particularly 
because  execution  data  Is  lacking  Though  some  compilers  do  indeed  try  to  pay 
special  attention  to  "inner  loops",  the  notion  of  inner  loop  becomes  tenuous  when 
considering  inter-procedural  optimization.  With  the  variety  of  computing 
environments  proliferating,  concerns  with  space  costs  increase.  On  small  personal 
computers  and  in  some  embedded  systems,  the  space  cost  may  be  of  paramount 


concern;  in  time-sharing  environments,  cost  is  usually  a  function  of  space  and  time,  so 
trade-offs  are  crucial  in  the  overall  optimization  process;  in  paged  systems,  space  can 
convert  indirectly  into  time  because  of  page  faults;  in  real-time  systems,  time  costs 
may  dominate. 

L2  Traditional  Techniques 

In  the  initial  stages  of  this  code-generator  design,  there  were  two  issues  which 
stood  out  as  unsuccessfully  treated  in  compiler  designs  with  which  I  was  familiar. 
First,  there  was  the  fact  that  while  code-generation  is  an  optimization  problem,  the 
objective  function  (to  use  a  term  from  mathematical  programming)  does  not  enter  in 
a  direct  way  into  the  process;  rather  it  forma  an  implicit  background  for  all  that 
happens.  The  reason  for  this  is  partly  that  execution  information  has  been  assumed 
to  be  unavailable  to  compilers  (see  above);  one  of  the  consequences  is  that  time-space 
tradeoff  questions  have  been  neglected  (again,  see  above).  The  second  bothersome 
issue  is  best  mwwwrf  up  in  the  standard  cliche  about  code-generators:  instruction 
selection  is  trivial  once  register  allocation  is  done,  and  register  allocation  is  trivial 

<w»  Inrinn-tl^fi  ffflpctiflq  J|  dom. 

Consider  peephole  optimization.  This  is  one  of  the  last  phases  of  a  compiler.  Its 
job  is  to  rummage  about  in  the  code  which  has  already  been  generated,  to  remove 
obvious  inefficiencies,  and  to  detect  patterns  which  can  be  mote  efficiently  compiled 
using  instructions  which  were  not  issued  by  earlier  phases  in  the  compiler.  It  is 
necessary  to  realize  the  important  role  played  by  this  technique  in  today’s  best 
optimizing  compilers  [8]:  It  is  without  doubt  the  moat  ad  hoc,  least  formalized,  and 
perhaps  least  aesthetically  pleasing  of  the  phases  of  the  compiler.  Yet  it  is  one  of  the 
most  effective".  Even  it  peephole  optimization  is  effective,  it  is  disturbing  that  it 
works  so  well  peephole  optimizations  are  often  allowed  by  the  fact  that  certain 
quantities  are  in  registers  (instruction  selection  is  trivial  if  _),  but  register  allocation 
was  done  much  earlier,  and  might  have  been  done  differently,  had  there  been  any 
knowledge  of  the  effect  it  would  have  on  peephole  optimization  (register  allocation  is 
trivial  if  — ).  And  all  of  this  is  happening  without  any  clear  view  of  the  ultimate 
effect  on  performance. 


13  A  New  Approach 

With  a  new  set  of  assumptions  in  mind,  and  with  reservations  about  some  aspects 
of  present  code-generation  techniques  aroused,  we  begin  to  describe  our  proposed 
The  place  to  begin  is  with  the  circularity  cliche,  and  the  driving  idea  is  never 
to  break  the  circularity-— instruction  selection  and  register  allocation  always  are  done 
together.  This  may  seem  impossible,  but  the  trick  is  to  look  at  a  small  enough  piece 
of  the  program,  so  that  doing  both  is  not  only  possible,  but  easy.  Instead  of 
beginning  by  code  generation  (both  register  allocation  and  instruction  selection) 
which  is  safe  globally,  and  patching  it  np  with  a  peephole  optimizer,  we  propose  to 
generate  code  which  is  optimal  locally,  and  gradually  paste  the  pieces  together  into  a 
coherent  whole,  modifying  both  register  allocations  and  instructions  in  the  process. 

The  aider  in  which  pieces  are  pasted  together  it  crucial  to  this  approach.  It  is 
done  in  order  of  decreasing  execution  frequency,  the  idea  being  to  get  things 
properly  arranged  an  the  expensive  paths  through  die  program.  For  example,  inner 
loops  which  are  truly  busy  will  be  compiled  first,  registers  arranged,  etc.  But  those 
pieces  inride  loops  that  are  seldom  used  will  have  no  influence  on  the  initial  register 
sssignments.  When  compiling  these  pieces,  if  there  is  some  new  register  problem, 
there  are  several  ways  to  resolve  it  Either  the  present  code  can  be  changed  to  make 
it  compatible,  or  the  new  piece  can  be  compiled  to  work  around  existing  conventions. 
The  relative  costs  of  the  two  methods  can  be  compared,  and  a  rational  choice  made. 

By  now.  the  meaning  of  "coagulation”  in  the  title  of  this  work  should  he  desuer. 
Imagine  the  program,  including  the  subroutines,  spread  out  over  a  table,  with  die 
compiler  dropping  Jeflo  on  the  parts  as  they  are  compiled.  At  first  little  drops  appear 
in  «*i«»wmigiy  random  places.  These  get  bigger  and  oomUne  with  other  drops  to  form 
growing  globs.  When  two  globs  meet,  ripples  will  go  out  through  each  as  they  adjust 
to  each  other's  presence,  although  the  parts  of  the  globs  that  formed  first  ate  leas 
affected  by  the  ripples.  When  compilation  is  complete,  there  is  one  congealed  mam. 


14  Scope  and  Limitation 

This  work  is  about  code-generation,  not  about  the  entire  process  at  compilation. 
We  will  assume  that  before  a  compiler  begins  generating  code,  it  will  already  have 
extensively  analyzed  die  program,  and  produced  an  Intermediate  form.  While  this  Is 


not  the  place  to  diacuas  intermediate  form  (see  chapter  5),  we  will  claim  here  that 
fyinwntir«n»i  languages  can  all  be  reduced  to  quite  similar  intermediate  forms.  There 
may  be  great  variations  in  surface  syntax  and  applications  among  COBOL,  LISP, 
FORTRAN  and  ADA,  but  from  die  point  of  view  of  code-generation  there  is  little 
difference.  On  the  other  hand,  this  work  makes  no  attempt  to  treat  highly 
unconventional  languages.  For  example,  the  issues  in  a  PROLOG  compiler  are  simply 
not  considered  here.  Neither  do  we  consider  query  languages  for  relational  databases. 


Any  diacwsion  of  code-generation  must  consider  target  architecture.  The 
techniques  presented  here  apply  to  conventional  machines,  such  as  the  PDP-10, 
IBM-370,  MC68000,  and  VAX.  These  are  characterized  by  generally  serial  operation, 
on  the  Older  of  10  registers,  and  instructions  that  usually  operate  on  or  via  the 
registers.  Special  pnrpose  machines,  for  example,  the  SCHEME  chip,  probably  would 
not  benefit  from  this  new  approach;  nor  would  highly  pipelined  or  vector  machines 
(we  certainly  do  not  addreas  die  problem  of  psrallriiring  serial  programs).  Although 
there  is  no  intent  to  address  die  isme  specifically,  this  design  may  be  useful  in  the 
generation  at  microcode,  where  the  problem  Is  to  have  the  data  in  the  right  place  at 
the  right  time. 


U  A  Guide  for  the  Reader 

This  work  develops  the  techniques  that  are  necessary  to  make  the  coagulation  idea 
into  an  algorithm  albeit  a  large  one-  tor  code-generation.  There  is  much  further 
work  that  could  be  done;  indications  of  topics  to  explore  further  are  described  as  they 
atfee  in  the  comae  of  the  dhemritm.  The  most  pressing  issue  is  that  of  an 
implementation,  which  has  not  yet  started.  It  is  too  often  necessary  to  apeal  to  one's 
seme  of  what  is  likely  to  be  found  in  real  programs,  rather  than  referring  to  evidence 
gathered  by  a  compiler  in  everyday 


For  the  reader  wishing  to  skim  this  work,  there  is  unfortunately  a  rather  sequential 
dependence  of  chapters.  The  best  approach  is  to  read  from  die  beginning,  until 
tired.  Chapter  2  outlines  the  mathematical  objects  that  we  will  study.  A  natural 
stopping  point  for  die  casual  reader  is  at  the  end  of  this  chapter.  Chapter  3  considers 
in  more  detail  two  relations  introduced  in  Chapter  2,  cohabitation  and  conflict 
Them  relations  capture  the  competing  influences  in  code-generation— the  desire  to 


here  values  remain  in  the  place,  for  speed,  and  the  necessity  to  have  values  be 
in  different  places,  to  preserve  the  meaning  of  the  program.  The  reader  will  have  a 
much  better  idea  of  coagulation  at  the  end  of  Chapter  3. 

Chapter  4  presents  a  technical  device  that  is  useful  in  representing  die  cohabitation 
and  conflict  relations,  and  in  detecting  inconsistency  between  the  two.  Skipping  this 
chapter  on  first  reading  will  cause  only  momentary  confusion  in  Chapter  5,  but  will 
leave  the  reader  unprepared  for  Chapter  6.  Chapter  S  provides  an  even  better 
perspective  on  coagulation,  because  of  die  thoroughgoing  way  in  which  it  follows  the 
imperative  to  be  optimal  locally,  rather  than  safe  globally.  * 

Seal  coagulation — techniques  for  joining  previously  "compiled"  but  unrelated 
pieces  is  the  subject  of  the  rest  of  this  work.  Chapter  6  provides  algorithmic  details 
for  enlarging  cohabitation  and  conflict  relations,  and  for  determining  whether  the 
new  relations  are  still  consistent  Chapter  7  lays  the  groundwork  for  dealing  with 
incoauistencies,  and  shows  that  inconsistencies  have  one  of  two  distinct  forms,  splits 
and  twists.  Chapters  8  and  9  give  techniques  to  deal  with  die  two  kinds  of 


2.  A  Glimpse  of  the  Basic  Concepts 

In  this  chapter  we  give  a  brief  preview  of  various  ideas  used  in  building  the  code 
generator.  With  these  in  view,  the  more  detailed  descriptions  of  both  processes  and 
data  structures  will  be  better  motivated. 

2J.  Regions 

We  view  the  program  as  being  represented  by  a  traditional  flowgraph.  A.  region  is 
a  subgraph  of  this  flowgraph  (possibly  consisting  of  a  single  node)  which  has  already 
been  compiled. 


arcs  not  yet  compiled 


We  speak  of  the  compilation  of  nodes  and  arcs  separately.  Compilation  of  a  node 
produces  a  new  region,  consisting  solely  of  the  node.  This  compilation  yields  a 
sequence  of  machine  instructions  for  the  part  of  the  program  which  lies  in  the  node, 
as  well  as  other  data  associated  with  the  region.  Thus,  a  node  is  not  necessarily  a 
maximal  flowblock,  but  rather  the  largest  piece  of  program  which  can  conveniently 
be  turned  into  a  region.  This  might  be  no  larger  than  a  single  operation,  e^. 
T  <-  V  +  U. 

The  compilation  of  an  arc  is  the  interesting  part  This  happens  only  when  the 
nodes  at  each  end  of  the  arc  have  been  compiled,  i.e^  each  node  is  in  some  region — 
perhaps  much  larger  than  a  single  node.  After  an  arc  has  been  compiled  it  means 
roughly  that  the  code  prior  to  this  arc  is  compatible  with  the  code  following  the  arc: 
for  example,  registers  are  compatibly  assigned.  If  an  arc  is  compiled  both  of  whose 


nodes  already  lie  in  the  same  region,  it  merely  has  the  effect  of  adding  an  arc  to  a 
region.  This  is  called  intro-region  compilation.  If  the  arc  connects  heretofore  distinct 
regions,  then  after  it  is  compiled,  there  is  only  one  region,  consisting  of  the  two 
smaller  ones,  pins  the  added  arc.  This  is  referred  to  as  inter-region  compilation. 

We  note  here  that  calls  on  a  routine  which  is  being  compiled  are  represented  (at 
least  conceptually)  by  graph  structure. 

flowgraph  of  calls  on  f  graph  representation 


When  an  arc  entering  a  subroutine  graph  is  compiled,  it  means  that  die  argument 
connections  (as  well  as  register  connections,  etc.)  are  mutually  understood  by  caller 
and  callee;  similarly  with  an  arc  leaving  a  subroutine  graph  and  result  conventions. 

The  execution  data  required  for  the  compilation  process  is  frequency  counts  on  the 
arcs  (not  merely  on  the  nodes).  Throughout  this  document,  the  frequency  of  an  arc 
will  mean  the  number  of  times  flow  passes  through  the  arc  during  one  execution  of 
the  program.  The  top-level  description  of  the  compiler  may  be  summarized  as 
fallows: 

Algorithm  2.1  Compilation 

for  A  «-  each  arc  in  order  of  decreasing  frequency 
if  die  entry  node  to  if  is  uncompfled  than  compile  it 
if  dm  exit  node  to  if  is  uncompfled  than  compile  it 
Compile  the  arc  jf. 


12  Cohabitation  and  Conflict 

that  the  nodes  in  a  region  have  compiled  code  associated  with  them.  A  line 
of  code  consists  of  an  .  opcode  followed  by  zero  or  more  operands.  Whenever  a 
variable  is  one  of  these  operands,  it  is  referred  to  as  an  occurrence  (Le.,  of  the 
variable).  Thus,  a  line  of  code  might  be  ADD  V.U.  To  refer  to  an  occurrence  of  a 
particular  variable,  we  use  the  notation  VR  where  n  is  often  1.  Beware  that  Vj  and  V2 
are  not  different  variables,  but  different  occurrences  of  the  same  variable.  Thus,  the 
above  line  of  code  would  usually  be  written  ADD  V  j,  Uj,  so  that  we  could  refer  to  the 
occurrences  Vj  and  U|.  If  we  wish  to  talk  about  an  occurrence  without  naming  its 
variable,  we  use  o,  posribly  subscripted.  The  variable  of  an  occurrence  is  denoted  by 
r,  so  "oj  and  o1  have  the  same  variable"  is  written  K<?i)  *  K<^)- 

There  are  two  important  relations  on  occurrences.  If  two  occurrences  cohabit,  it 
wmw  that  die  present  code  relies  upon  the  fact  that  the  two  occurrences  are  in  the 
mim  memory  location  ("memory”  here  includes  register  and  stack  locations).  Thus 
cohabitation  is  an  equivalence  relation,  partitioning  occurrences  into  cohabitation 
datm :  Cohabitation  is  the  only  way  in  which  the  same  memory  location  is  referred 
to  by  different  instructions.  The  code  corresponding  to  different  (in  the  source 
code)  occurrences  of  the  same  variable  may  find  "the  variable"  in  quite  different 
places.  Moreover,  occurrences  of  different  variables  may  cohabit,  if  that  is  a  useful 
optimization. 

The  second  relation  of  interest  is  that  of  conflict.  If  two  occurrences  conflict,  it 

means  that  present  code  relies  upon  the  fact  that  the  two  occurrences  do  not  occupy 

the  same  memory  location.  There  is  a  simple  rule  relating  cohabitation  and  conflict: 

Two  occurrences  which  are  in  conflict  may  not  be  a  member  of  the  same 
cohabitation  class. 

This  is  called  the  consistency  rule;  much  more  will  be  said  about  it,  or  rather,  about 
inconsistency. 

23  Supply  and  Demand  Sets 

Define  an  entry  node  of  a  region  to  be  any  node  in  that  region  which  has  an 
nncompited  incoming  arc;  dually,  an  exit  node  is  one  which  has  an  uncompiled 
outgoing  arc.  A  boundary  node  of  a  region  is  any  entry  node  or  exit  node  of  the 
region. 


Following  standard  terminology,  we  a  variable  live  at  some  point  in  the 
flowgraph  if  there  is  some  execution  path  leaving  that  point  along  which  the  variable 
is  aw)  before  it  is  set  Each  entry  node  for  a  region  specifies  the  set  of  variables 
which  are  live  at  entry  to  die  node,  and  which  can  be  seen  to  be  live  solely  by 
linking  at  the  region.  This  set  of  variables  is  represented  by  a  set  of  occurrences, 
railed  the  demand  set  Similarly,  esc-h  exit  node  for  a  region  specifies  the  set  of 
variables  which  are  live  at  exit  from  the  node  and  which  are  mentioned  in  the  region. 
As  before,  this  set  of  variables  is  represented  by  a  set  of  occurrences,  the  supply  set. 
Either  of  these  sets  may  be  referred  to  as  a  boundary  set.  The  calculation  of 
boundary  sets  presupposes  a  pass  over  the  program  to  do  live-dead  analysis.  Further, 
the  of  the  jdfftK  "mentioned  in  the  region"  are  more  subtle  than  one 

might  expect  The  details  of  both  these  issues  are  discussed  in  chapter  4. 

The  purpose  of  the  boundary  sets  is  to  aid  in  the  compilation  of  an  arc.  A 
variable  may  have  occurrences  in  both  the  supply  and  demand  sets,  in  which  case  the 
occurrences  must  be  made  to  cohabit  It  is  also  possible  for  a  variable  to  have  an 
occurrence  in  the  boundary  set  of  only  one  of  the  regions.  Only  variables  that  are 
both  live  and  mentioned  in  a  region  are  necessary  in  the  coagulation  process,  and 
only  those  are  included  in  boundary  sets. 

1A  The  Coat  Metric 

The  objective  function  for  the  optimization  process  is  the  cost  of  running  the 
program  on  the  same  data  which  generated  the  arc  frequencies  that  govern  order  of 
compilation.  We  assume  this  metric  to  be  a  bilinear  function  of  average  space  S'  and 
total  time  T: 

cost (5,  7)  *  0|*5-  T+  02'  T+  03  S+  04 

This  formula  coven  most  charging  policies,  and  the  coefficients  have  reasonable 
interpretations: 

«l  is  the  coat  per  unit  space  per  unit  time 
02  is  the  cost  per  unit  time  of  die  CPU 
«3  is  a  job  surcharge  for  space 
04  is  a  job  submission  charge 

In  decisions,  we  are  interested  in  incremental  cost 

co*t(S+ x,  T+  /)-cost(S  7)  ■«!*($•*+  ri  +  rd  +  %-/+«j-r 

The  term  S't  will  generally  be  much  smaller  than  die  other  terms,  and  may  be 


ignored.  Rearranging  the  rest,  we  have; 

(dj  •  7*  +  tfj)  ■  r  +  (dj  •  S  +  tfj) '  f 

Whenever  we  are  <*h«ng«ig  or  adding  an  instruction,  it  is  simple  to  determine  the 
extra  space  involved  (r),  and  since  we  know  the  frequency  of  the  instruction,  we 
know  tiie  extra  time  involved  (/). 

The  missing  important  parameters  are  the  overall  space  S  and  overall  time  T. 
Since  the  program  has  been  ran,  there  should  be  a  known  value  for  £  If  it  was  ran 
in  an  interpretive  environment,  there  will  be  some  change  in  space  due  to 
compilation;  since  at  least  one  pass  has  been  made  over  the  program  before 
code-generation  begins,  the  size  of  the  source  will  be  known.  A  little  experience  with 
the  compiler  should  give  a  reliable  conversion  factor  from  source  size  to  object  code 
size,  so  £  can  be  estimated.  Of  course,  if  the  program  was  ran  in  its  compiled  form, 
we  have  actual  experience  (probably  space  doesn't  change  too  much  with  small 
changes  in  the  program).  It  mutt  be  remembered  that  moat  of  S  may  be  space  for 
data,  not  for  program,  so  that  errors  in  the  estimated  size  of  the  program  do  not 
gravely  affect  the  overall  estimate. 

The  estimation  of  T  is  trickier,  but  the  same  ideas  apply.  Since  we  can  — nmc 
that  the  source  has  frequency  statements  attached,  a  Httle  experience  should  give  a 
usable  estimate  for  the  overall  time  of  the  compiled  program.  Direct  experience  with 
previous  compilations  of  the  program  being  compiled  will  of  course  be  more  reliable. 
But  here  we  do  not  have  a  cushion  analogous  to  the  one  that  exists  for  space 


Once  there  are  estimates  for  5  and  T,  we  can  obtain  the  basic  time-space  trade-off 
factor 

Thus,  arisen  dunging  the  program  in  a  way  which  uses  s  extra  units  of  space  and  is 
executed  /times,  the  extra  cost  is  proportional  to: 

s  ♦  c  f  where  c  ■  b-time  to  execute  the  instruction 
The  expression  t  +  c  /will  appear  often  in  this  paper,  ae  the  generic  cost  of  sdding 
or  modifying  an  instruction.  Realize  that  s  and  c  are  determined  by  the  particular 
instruction. 


Using  this  analysis,  we  can  gain  some  quantitative  insight  into  the  time-space 


problem.  For  simplicity,  assume  -  a$  =  0  (or  ere  negligible).  Then  b  -  S/T.  «o  a\ 
it  irrelevant  to  studying  trade-off.  Suppose  that  we  have  a  10K  program,  with  4GK  of 
data,  so  S  *  50K  and  that  the  program  runs  in  T  -  100  seconds;  thus  b  =  500  (with 
units  of  words/sec).  Suppose  we  are  trying  to  decide  between  using  2  instructions 
which  take  2  *s  each  versos  one  instruction  that  takes  5  j»  (for  all  instructions,  s  * 
1).  Then,  the  two  instruction  sequence  is  preferable  under  the  condition: 

(2  inst)  •  (1  word/inst  +  (500  words/sec)  •  (2  •  1CT6  sec/ins t) -/) 

<  (1  inst)  •  (1  word/inst  +  (500  words/sec)  *  (5  •  10"*  sec/inst)  •/) 

«*  f>  2000 

The  question  is,  how  often  do  "typical"  instructions  get  executed  in  100  secs? 
Programs  are  quite  uneven  in  the  distribution  of  their  runtime.  Assume  that  90%  of 
die  runtime  accrues  uniformly  in  10%  of  the  code,  and  10%  of  the  runtime  accrues 
uniformly  in  the  other  90%  of  the  code.  Estimate  the  average  instruction  time  as 
2  ps,  and  compute  the  frequencies  of  the  busy  and  non-busy  parts  of  the  program  in 
a  100  second  execution. 

busy  frequency  *  45000  non-busy  frequency  3  555 
Thus,  we  should  use  the  two  instructions  in  the  busy  part  of  the  program,  but  .  only 
the  one  in  the  non-busy  part  One  cannot  help  but  one  wonder  how  many 
programmers  would  exercise  the  correct  judgment  intuitively,  and  whether  it  would 
worth  their  time  to  do  the  calculation  every  time  a  question  came  up. 

While  this  example  shows  that  time-space  trade-offs  can  arise  in  the  selection  of 
instructions,  the  importance  of  such  analysis  will  probably  lie  in  deciding  upon  other 
optimiation  strategies.  For  example,  beck-substitution  of  subroutines  can  be  quite 
expensive  in  terms  of  space.  It  can  also  produce  dramatic  time  savings,  particularly 
when  applied  to  small  data-structure  access  routines  which  the  programmer  may  have 
defined  for  the  sake  of  modularity.  Using  the  analysis  we  have  outlined  above,  it  is 
possible  to  decide  which  back-substitutions  really  do  pay  off.  Another  optimization 
technique  involving  time  spare  tradeoffs  is  that  of  loop  unrolling.  Only  when  the 
com  of  space  b  accounted  for  does  one  know  when  to  stop  the  unrolling  process. 

25  Register  Allocation  Representation 
Associated  with  each  region  is  a  structure  representing  required  and  possible 
register  allocation.  The  "required"  attribute  means  that  this  structure  records  which  of 
the  cohabitation  dames  are  assumed  to  be  kept  in  which  regfrters.  while  the  "possible" 


attribute  nwua  that  a  definite  sflocaviop  is  not  given,  only  that  from  the  structure  it 
is  es my  to  aaign  registers  in  such  a  way  that  the  presently  generated  code  will  work 
(thereby  providing  an  »<"«♦*«***  proof  that  allocation  is  feasible).  For  example,  if  a 
machine  has  n  identical  registers,  this  data  structure  can  simply  be  a  set,  of  size  not 
OTfjjwHng  H'  of  acts  of  cohabitation  Each  of  the  sets  of  cohabitation  classes 

would  be  — to  the  register.  In  this  scheme,  there  is  clearly  a  requirement 
that  any  two  members  of  the  miw.  set  of  cohabitation  claaes  not  be  in  conflict;  any 
set  (of  sets)  of  size  less  than  or  equal  to  a  and  obeying  this  requirement  would  specify 
a  correct  register  allocation. 


3.  More  on  Cohabitation  and  Conflict 

In  this  chapter  we  lay  oat  the  general  approach  to  the  building  and  maintaining  of 
the  cohabitation  and  conflict  relations  Precise  details  will  be  given  later;  the  idea 
here  is  to  show  what  kind  of  thinking  motivates  the  details.  We  conclude  with  an 
example  showing  an  optimisation  which  arises  naturally  in  this  code  generator. 

3L1  Calculation 

We  now  give  a  rough  description  of  when  cohabitation  and  conflict 
established.  Cohabitation  arises  in  two  different  ways.  The  most  obvious  is  the 
of  control  from  one  occurrence  of  a  variable  to  the  neat  occurrence  of  the 
variable.  In  the  compiled  code  for  a  region,  if  flow  of  control  can  pass  from 
occurrence  of  a  variable  to  another  occurrence  of  the  variable  without  passing 
in  intervening  use  of  the  variable,  then  the  two  occurrences  must  reference  the 
location.  As  we  stated  earlier,  the  only  way  for  this  to  happen  is  for  the  occurrences 
to  cohabit 

A  less  obvious  way  for  cohabitation  to  arise  is  from  die  assignment  of  scalar 
variables.  This  is  In  part  a  consequence  of  the  dictum  that  code  is  generated  in  the 
most  efficient  way  for  the  smallest  possible  context.  When  confronted  with  a 
statement  of  the  form  V  <-  U,  the  compiler  takes  the  optimistic  approach  that 
nothing  at  all  has  to  be  done  here,  because  it  can  be  arranged  for  V  and  U  to  occupy 
dte  same  location.  This  may  seem  insanely  optimistic,  but  it  is  done  not  so  much 


Parameter  pairing  is  modeled  as  assignment;  die  cohabitation  of  actual  and 
formal  para  mentor  means  that  the  argument  to  the  subroutine  is  left  in 
exactly  die  right  place.  To  encourage  this,  the  initial  assumption  is  that  it 
is  possible. 

Since  trivial  assignments  are  optimized  away,  earlier  phases  in  the  compiler 
need  not  worry  about  creating  extra  assignment,  if  that  is  a  convenient 
way  to  express  a  transformation. 

The  programmer’s  assignments  may  be  necessary,  but  they  may  be  in  the 
wrong  place,  from  an  optimization  point  of  view.  By  maiming  control 
over  them,  it  ia  easier  for  the  compiler  to  produce  better  code. 

In  summary,  the  minimization  of  moving  things  around  is  one  of  the  central 

problems  of  low-level  optimization.  The  compiler  takes  complete  control  of  this,  not 


lillfs 


allowing  itself  to  be  «if>  "***•*«<  by  the  programmer's  assignments.  Thus  the  only  way 
that  move  instructions  are  generated  is  when  all  of  the  optimistic  assumptions  lead  to 
trouble,  Le.,  to  inconsistency.  Not  surprisingly,  inconsistencies  can  always  be  resolved 
with  move  instructions;  the  problem  is  to  do  so  as  efficiently  as  possible,  and 
especially,  to  avoid  moves  when  possible. 

To  discuss  conflict  we  must  first  discuss  generations.  A  generation  is  an 
<yy.irri»nr»  which  is  modified  by  an  instruction.  (We  will  call  the  left  hand  side  of 
an  assignment  statement  a  first  use.)  We  will  denote  generations  with  an  asterisk 
superscript,  as  in  Vj.  This  convention  will  also  help  in  reading  the  generic  machine 
UnflMg*  aged  in  the  examples — the  asterisked  occurrence  is  the  destination  of  the 
instruction. 

Conflict  arcs  are  established  when  one  occurrence  is  "propagated  past"  an 
occurrence  which  is  a  generation.  For  example,  consider  the  code  sapience: 

HOVE  V,MJ! 

ADD  Xj.Yj 

The  **r«rr**f**  X  J  and  Yj  must  both  conflict  with  Vj.  because  V  is  being  changed, 
and  by  the  semantics  of  file  language  this  is  not  supposed  to  affect  X  and  Y  (assume 
no  sharing  here).  It  is  sufficient  to  establish  conflict  only  when  propagating  past 
generations,  as  we  shall  see  in  section  6.1. 

^Representation 

We  have  seen  that  cohabitation  is  an  equivalence  relation.  We  have  also  seen  that 
cohabitation  arises  locally,  because  of  flow  or  assignment  For  inconsistency 
resolution,  it  is  useful  to  keep  track  of  the  individual  "reasons"  for  the  existence  of  a 
cohabitation  dam.  We  do  this  with  a  cohabitation  grtfk,  whose  nodes  are  occurrences 
and  whose  arcs  trim  from  the  flow  of  data  from  one  occurrence  to  die  next  or  from 
alignments.  It  is  convenient  to  let  this  be  a  directed  graph,  with  arrows  in  the 
direction  of  the  flow  of  data.  A  cohabitation  class  corresponds  to  a  connected 
component  of  die  cohabitation  graph. 

Since  conflict  is  a  relation  on  occurrences,  it  too  may  be  viewed  at  a  graph  whose 
nodes  are  occurrences.  This  graph  and  the  cohabitation  graph  share  node  sets,  so  it  is 
often  convenient  to  draw  them  on  the  same  set  of  nodes,  and  distinguish  arc  types. 


The  pictures  are: 


cohabitation 


These  relations  are  not  static  daring  the  coarse  of  a  compilation,  bat  are  continually 
adjusted  as  regions  grow  and  coalesce,  The  cohabitation  relation  will  be  represented 
in  part  by  a  graph.  Because  the  conflict  relation  is  eery  dense,  its  representation  and 
manipulation  as  a  graph  would  be  very  expensive;  instead,  it  is  represented  indirectly, 
by  means  explained  in  chapter  6. 


33  Conte  ob  Cohibitattea  Arcs 

When  an  inconsistency  arises,  it  mast  be  resolved.  This  is  done  on  die  basis  of 
costs  — to  each  cohabitation  arc.  which  we  think  of  as  the  cost  of  "breaking" 
the  cohabitation.  The  problem  of  deciding  what  number  to  assign  as  the  cost  of  a 
cohabitation  arc  is  more  difficult  than  simply  deciding  whether  to  establish  the  arc. 
This  difficulty  ashes  because  to  assign  a  number,  it  is  necessary  to  anticipate  how  an 
arc  might  be  broken.  We  consider  die  details  of  this  problem  later,  and  focas  here 
oq  tbs 

The  breaking  of  a  cohabitation  arc  usually  involves  adding  some  instruction^),  or 
using  more  expensive  variants  of  already  generated  instructions.  If  we  knew  what 
these  instructions  were,  we  could  use  the  cost  metric  described  earlier  to  obtain  the 
appropriate  cart  Cor  the  arc.  The  problem  is  that  at  the  time  an  arc  is  being 
established,  it  is  not  worthwhile  to  determine  these  instructions.  This  is  partly 
because  there  is  no  reason  to  spend  a  lot  of  time  trying  to  figure  out  how  to  break 
arcs  whose  breaking  will  never  be  helpful  in  resolution,  and  partly  because  breaking 
an  arc  takes  place  In  the  context  of  inconsistency  resolution,  so  that  the  best  way  to 
do  it  depends  upon  a  larger  context  Rather  than  try  to  obtain  the  cost  exactly,  what 
is  done  is  to  get  a  good  /war  bound  an  the  cost  of  breaking  die  arc,  Le.,  we 
untkr  estimate  the  cost  of  resolution,  but  by  as  little  as  poutible.  When  the  time 
comes  to  resolve  an  inconsistency,  the  approach  is  as  follows: 

1.  A  set  of  arcs  to  be  broken  is  chosen  on  die  basis  of  the  costs  on  the  arcs. 

2.  Given  the  set  of  arcs  in  step  1,  tee  precise  modifications  are  determined 
and  die  precise  cost  of  breaking  this  set  is  calculated.  If  this  turns  out  to 
be  much  more  than  expected,  the  modifications  are  remembered,  but 


16 


step  1  and  this  step  are  repeated,  looking  for  a  better  set  to  break. 

3.  Eventually,  the  step  1-2  loop  stops,  and  we  pick  the  set  with  minimom 
actual  cost. 

If.  in  step  2.  we  discover  that  an  arc  is  more  expensive  to  break  than  was  originally 
anticipated,  the  cost  of  the  arc  may  be  revised,  so  that  future  calculations  in  step  1 
will  have  a  more  accurate  view  of  things.  This  rise  in  costs  is  one  of  the  ways  in 
which  the  steps  1-2  loop  terminates  -  eventually,  the  actual  cost  is  close  to  the 
estimated,  and  the  estimated  cost  is  known  to  be  about  as  good  as  possible.  So  we  use 
the  modifications  which  have  been  calculated. 

3j4  Aa  Example 

At  this  point,  we  offer  an  example  which  shows  how  cohabitation,  conflict,  and 
inconsistency  resolution  interact  when  compiling  a  program.  The  reader  will  have  to 
accept  some  statements  on  faith,  such  aa  costs  on  arcs,  choice  of  arcs  to  break,  and 
how  the  compiler  chooses  to  implement  the  breaks. 

The  example  we  choose  h  a  conditional  exchange.  Le.,  a  statement  such  as: 

IF  ...  THEN  Z  <-  X?  X  <-  Y?  Y  <-  Z  ENDIF  ... 

By  the  time  the  code  generator  sees  this,  we  may  amume  we  are  dealing  with  the 
flowgraph  fragment  and  conflict  relation  (numbers  near  the  arcs  denote  their 
frequency): 


The  most  frequent  arcs  are  compiled  first,  so  say  that  the  two  arcs  touching  the 


17 


exchange  box  are  compiled  next.  The  first  of  these  will  cause  the  box  itself  to  be 
compiled,  resulting  in  cohabitation  arcs  and  boundary  sets  below.  CHB  (cohabit)  is  a 
pseudo-op  that  provides  a  place  for  the  occurrences.  It  requires  no  space  in  the 
eventual  machine  code,  and  no  time  to  execute.  The  actual  cohabitation  information 
is  in  the  graph  on  the  right  (The  number  on  a  cohabitation  arc  denotes  die  cost  of 
breaking  it) 


demand  X3,  Yj  1  # 

CHB  Zj.Xj 

Zl| 

CHB  X2,Y3 

140  40 

CHB  Y2,Z2 

z*%  • 

2  Ian  ; ; 

supply  X2,Y2  Y  J 

T2 

Compiling  the  frequent  arcs  will  result  in  nothing  more  than  establishing  cohabitation 
arcs  among  matching  dements  of  supply  and  demand  sets.  This  results  in  the 
following  overall  relations: 


Zv 
1? 


XoT— 

X'U 

u 

2U 

YJ* 


4V° 

4* 

?X2 

40j 

iX3 


Note  that  no  inconsistency  has  arisen  yet  This  happens  when  the  remaining  arc 
(with  frequency  2)  is  compiled.  The  graph  after  compilation,  but  before  resolution, 
is  as  follows: 


Z3 

Y, 


II 

Q 

u 

II 

l<0 

A 

2 

40| 

a 

A 

k. 

_40| 

i40 

J40 

_ _ j 

2 

Vo 

'l 

<2 

(3 


It  is  clear  that  die  cheapest  way  to  resolve  this  is  to  break  the  Xg-X3  and  Yq-Y3  area. 


This  is  done  by  move  instructions  placed  on  the  infrequent  arc.  To  avoid  ordering 
problems,  and  to  exploit  instructions  which  move  several  quantities  at  once, 

it  is  convenient  to  postulate  a  "simultaneous  move"  instruction  which  we  place  on  the 
arc.  This  results  in  the  new  flowgraph  fragment  and  new  relations  (X4,  Xj,  Y4,  and 
Y  j  are  new  occurrences  for  the  new  instruction): 


lemember  that  what  actually  gets  "moved"  is  cohabitation  classes,  of  which  we 
presently  have  two.  cx  and  c2  as  labeled  above.  Writing  the  simultaneous  move  in 
these  terms,  we  have  MOVE  «fc,  fj>,  <Cj.  <fc>.  This  begs  to  be  compiled  ss  an 
exchange  instruction,  if  one  is  available.  And  note  that  the  programmer  wrote  the 
exchange  on  the  other  aid 


4.  Extra  Occurrences 


41  Motivation 

There  is  a  tradeoff  regarding  the  construction  of  the  cohabitation  and  conflict 
relation.  On  die  one  hand,  it  would  be  ideal  if  the  mere  selection  of  a  set  of  arcs 
would  indicate  exactly  where  to  place  the  moves  to  break  the  set  But  there  is  so 
modi  choice  in  where  to  place  moves  that  any  data  structure  which  would  achieve 
this  goal  would  be  huge.  On  the  other  hand,  if  the  relations  don't  give  a  reasonably 
good  idea  as  to  where  the  moves  go,  it  means  that  as  the  relations  are  being 
constructed,  one  doesn't  really  have  a  very  good  idea  of  what  the  costs  of  breaking  an 
arc  are.  This  to  either  man*!  optimization  opportunities  (if  over-estimation 
occurs),  or  extra  expense  in  resolving  inconsistences  (if  underestimation  occurs). 
Further,  if  the  cohabitation  and  conflict  information  doesn't  give  a  good  idea  about 
where  to  place  moves,  the  calculation  of  precisely  where  to  place  them  may  be 
extremely  expensive. 

The  compromise  we  use  is  an  indirect  one.  Rather  than  associating  with  a 
cohabitation  arc  some  kind  of  data  indicating  how  to  place  the  moves,  we  add  a  few 
extra  occurrences  of  variables  to  the  Jftnvgraph.  These  occurrences  will  appear  in  the 
cohabitation  and  conflict  relations  just  like  "legitimate”  occurrences.  The  placement 
of  these  occurrences  in  the  flowgraph  is  related  to  the  likely  places  for 
inconsistency-resolving  moves,  so  that  they  aid  in  determining  where  to  place  these 
moves.  We  noted  earlier  that  cohabitation  arcs  are  directed.  The  reason  for  this  is 
to  further  aid  in  the  determination  of  where  to  place  moves. 

42  Merge  tad  Split  Occurrences 

Some  extra  occurrences  of  a  variable  may  be  added  at  points  where,  relative  to  the 
variable,  flow  splits  or  merges. 

DeflnMen.  Given  a  variable  V,  a  V -merge  node  is  one  which  can  be  reached  from 
different  occurrences  of  V  along  paths  whose  intersection  consists  solely  of  the  node 
If.  A  'i -split  node  N  it  defined  dually,  Le.,  is  one  from  which  distinct  occurrences  of 
V  may  be  reached  along  paths  which  intersect  only  at  If.  □ 


The  calculation  of  V -merge  and  V -split  nodes  can  be  solved  by  using  Tarjan’s 
techniques  for  path-problems  on  directed  graphs  [6]. 

Not  every  V -merge  and  V -split  node  has  an  extra  occurrence  of  V.  The  precise 
rules  are: 

M  If  N  is  a  V -merge  node  and  V  is  live  at  entry  to  N,  there  is  an  occurrence  of 
V  before  the  first  change  to  a  variable. 

S  If  N  is  a  V -split  node  and  V  is  live  at  exit  from  N,  there  is  an  occurrence  of  V 
after  die  last  change  to  a  variable. 

If  necessary,  extra  occurrences  are  placed  under  the  paeudoop  USE.  The  rationale 
for  extra  occurrences  will  become  clearer  as  the  techniques  of  cohabitation/conflict 
calculation,  cost  estimation,  and  move  placement  are  discussed.  We  now  give  some 
hints  as  to  why  these  seem  to  be  the  proper  concepts.  Consider  the  following 
program  segment: 


Suppose  that  the  upper  two  uses  of  V  are  inconvenient  to  keep  in  die  place. 
The  place  to  fix  this  is  on  one  of  the  arcs  coming  into  N*  which  forces  V  to  be  in 


Note  that  V  is  lire  at  the  bottom  of  N,  so  N  is  a  V -split  node,  and  an  extra  occurrence 
of  V  is  placed  at  the  bottom  of  JV*  call  this  occurrence  V4.  In  the  compilation  of  the 
arc  entering  N,  there  would  be  a  cohabitation  arc  from  V2  to  V4,  which  we  denote 
V|-»V4.  Sundariy,  compilation  of  the  arc  exiting  IV  on  the  right  would  produce  the 
cohabitation  V4-*V3.  Suppose  it  became  necessary  to  break  the  cohabitation  chain 
If  each  of  the  arcs  exiting  N  has  a  non-zero  frequency  and  flow  is 
conserved,  the  arc  leaving  AT  has  a  lower  frequency  than  the  arc  entering  N.  Thus 
the  cost  of  V^Vj  is  less  than  the  cost  of  Vj-»V4,  and  die  place  to  break  die  chain  is 
V^Vj.  This  gives  a  good  idea  where  to  place  the  move— along  die  arc  from  N  to  the 
occurrence  V3.  Without  the  extra  occurrence  one  would  have  only  die  cohabitation 
Vj-*V3.  If  it  must  be  broken,  it  is  harder  to  figure  out  where  to  put  the  moves.  It  is 
also  more  difficult  to  come  up  with  a  general  way  of  estimating  die  cost  of  breaking 
a  cohabitation. 

43  Intermediate  Subgraphs 

This  section  describes  in  more  detail  how  the  insertion  of  extra  merge  and  split 
occurrences  limits  the  part  of  the  program  involved  when  breaking  a  cohabitation.  In 
order  to  discuss  this,  we  assume  that  the  program  graph  has  a  single  entry  and  single 
exit  node.  We  then  use  the  standard  graph  terminology. 

DaflaltiM.  A  node  (or  arc)  N\  dominates  a  node  (or  arc)  N2  when  JV2  is  on  every 
path  from  the  entry  to  N2.  Dually.  ATj  back-dominam  N2  when  JVj  is  on  every  path 
from  the  N2  to  the  exit  □ 


We  wish  to  extend  dominstor  terminology  to  occurrences.  We  do  so  by  defining  a 
relation  on  paths  and  occurrences,  so  that  any  graph-theoretic  notion  defined  in 
terms  of  paths  will  extend  to  occurrences.  We  first  need  the  notion  of  one 
occurrence  lying  above  or  below  another  occurrence  in  the  same  node  (they  may  also 
be  unordered).  This  is  usually  clear  in  any  given  context,  and  is  not  formalized 
further  here.  But  using  it,  we  have: 

Definition.  When  we  say  that  an  occurrence  o  is  on  a  path  from  an  occurrence  p  to 

an  occurrence  q  we  mean  both  the  following: 

If  o  is  in  die  same  node  as  p,  it  is  below  p,  otherwise  o  is  in  some  node  on 
the  path  from  the  node  of  p  to  the  node  of  q. 

If  o  is  in  the  same  node  as  9,  it  is  above  g,  otherwise  0  is  in  some  node  on 
the  path  from  the  node  of  p  to  the  node  of  9. 

□ 

Note  that  if  p  and  9  are  in  the  same  node,  this  says  that  o  is  below  p  and  above  9. 
With  this,  we  introduce  an  idea  that  is  used  throughout  this  paper 

Definition.  Let  p,  9  be  any  two  occurrences  (or  nodes  or  arcs)  of  a  program.  The 
intermediate  subgraph  of  p  and  9.  written  G(p,  9),  consists  of  occurrences  and  arcs 
dominated  by  p  and  hack-dominated  by  9,  and  of  all  nodes  touched  by  die  arcs. 

□ 

We  usually  deal  with  intermediate  subgraphs  of  p  and  9  where  p  and  9  are  different, 
but  nearby,  occurrences  of  the  same  variable.  The  live  region  of  a  variable  partitions 
naturally  into  certain  of  these  subgraphs,  because  of  the  insertion  of  extra 
occurrences.  Before  stating  the  main  result,  we  have  some  preliminaries, 

Definition.  Let  V  be  a  variable  of  the  program.  A  V-frte  path  is  one  which  has  no 
occurrences  of  V  in  any  of  its  nodes,  except  perhaps  for  the  beginning  and/or  end 
occurrence,  if  the  path  begins  and/or  ends  on  an  occurrence. 

□ 

Definition.  An  occurrence  Vj  V -dominates  an  occurrence  (or  node  or  arc)  9  if  Vj 
dominates  9  and  if  every  V-free  path  from  an  occurrence  of  V  to  9  begins  at  Vj. 

□ 

In  the  next  result  and  throughout  this  paper,  we  assume  that  any  variables  which  are 
live  at  the  entry  to  die  flowgraph  have  an  occurrence  there  (at  least  conceptually), 


and  Hut  V -merge  nodes  are  calculated  accordingly. 


Lemma  4.1  Suppose  there  is  a  V-free  path  from  an  occurrence  Vj  to  an  arc  A  (resp. 
a  first  use  a),  and  that  V  is  live  on  A  (resp.  at  o).  Then  V  j  V -dominates  A  (resp.  a). 
Preef.  We  consider  the  arc  case,  first  showing  that  Vj  dominates  A.  Suppose  there 
is  path  from  die  entry  to  A  which  avoids  V  j.  This  path  mast  intersect  the 
V-free  path  from  Vj  to  A  at  some  node  above  A.  Pick  the  nearest  such  node  to  Vj. 
This  will  he  a  V -merge  node,  because  there  are  disjoint  paths  from  distinct 
occurrences  of  V  to  this  node.  But  a  V -merge  node  must  contain  an  occurrence  of  V, 
and  by  die  V-freeneas  of  the  path,  the  node  most  be  that  containing  Vj.  Bat  this 
contradicts  the  choice  of  the  path  to  avoid  Vj.  The  only  possibility  is  that  Vj 
dominates  A. 

To  show  that  Vj  V -dominates  A,  we  must  show  that  every  V-free  path  from  an 
occurrence  of  V  to  A  starts  at  Vj.  Suppose  instead  there  is  a  V-free  path  to  A  from 
V2.  This  path  intersects  with  die  path  from  Vj  to  A  in  the  hypothesis  of  this  Lemma 
at  a  node  above  A,  causing  a  contradiction  aa  in  the  above  paragraph.  This  proves 
the  lemma  in  the  arc  case. 

We  next  consider  die  lemma  far  a  first  use  o.  The  proof  here  is  essentially  die 
same  aa  far  an  arc,  the  difference  being  that  when  the  paths  intersect,  it  will  be  at  a 
V -merge  node,  so  an  occurrence  of  V  will  appear  above  any  first  use  in  the  node,  in 
particular,  above  a.  Thus,  we  do  not  have  a  V-free  path  from  Vj  all  the  way  to  a.  □ 

Daffadtlea.  An  occurrence  Vx  V -back-dominata  aa  occurrence  (or  node  or  arc)  q  if 
V|  buck-dominates  q  and  if  every  V-free  pads  from  q  to  an  occurrence  of  V  ends  at 
V,.D 

Lamma  4w2  Suppose  there  is  a  V-free  path  from  an  arc  A  (resp.  a  first  use  a)  to  a 
non-first  occurrence  V  j.  Then  V|V-back  dominates  A  (tap.  a). 

Praaf.  Exactly  dual  to  the  proof  of  Lemma  1. 

□ 

The  main  remit  of  this  section  is  this: 

Thaeraa  4.1  Suppose  the  variable  V  is  live  on  the  arc  A  (resp.  at  first  use  a).  Then 
A  (resp.  a)  lies  in  a  unique  minimal  intermediate  subgraph  of  the  form  OCVj.Vj). 
Praaf.  Look  at  any  path  from  the  entry  to  the  exit  through  A  or  a.  Let  Vj  be 


UMUiiaiiia 


the  last  occurrence  of  V  on  this  path  before  A  or  o,  and  let  V2  be  the  first  occurrence 
of  V  on  this  path  after  A  or  o  (there  most  be  one.  else  V  is  not  lire  an  A  or  at  o).  By 
1,  Vj  dominates  if  or  o,  and  by  Lemma  2,  V2  back-dominates  A  or  or,  thus  A 
or  a  is  in  OfVi.V^.  This  proves  the  existence  of  V  ltV2. 

To  prove  uniqueness,  asmme  if  or  o  also  lies  in  G(V^V p.  By  Lemma  1.  any  path 
from  V/  to  if  or  o  mast  go  through  Vj.  Thus  CKV^V j}  contains  G(V |,Vj),  and  is 
strictly  bigger  if  /  «  1.  By  minimality,  then.  /  =  1.  Dually,  we  conclude  y  =  2.  □ 

We  note  that  this  Theorem  does  not  hold  if  the  phrase  "arc  AT  is  replaced  by 
"node  JV"  or  if  we  allow  arbitrary  occurrences  o,  rather  than  just  first  uses.  For 
example,  a  V -merge  node  containing  V}  as  the  first  use  of  V  will  be  in  GO^Vj)  far 
several  different  /.  On  die  other  hand.  Vj  itself  does  not  belong  to  OCf^O  for  any  i 
since  it  is  not  dominated  by  the  same  may  be  said  for  are  occurrences  above  Vj 
(but  in  Vj’s  node).  While,  die  partitioning  of  area  and  first  uses  is  invariant  in  what 
follows,  the  lack  of  this  for  nodes  and  for  general  occurrences  Is  not  a  problem. 

A  ftu  tlier  consequence  of  the  way  that  intermediate  subgraphs  divide  up  die 
flowgraph  is  the  following. 

Corollary  4.1  Let  S  be  any  connected  subgraph  of  die  flowgraph  which  has  no 
occurrence  of  V,  hut  in  which  V  is  live  at  some  place.  Then  V  is  live  In  all  of  5  and 
all  of  the  arcs  of  S  belong  to  the  same  minimal  Intermediate  graph  G(V 
Proof.  We  use  induction  on  the  number  of  arcs.  If  this  number  is  zero,  the  result  is 
vacuous.  If  diem  is  at  leret  one  arc,  pick  one,  and  by  Theorem  4.1.  choose  Vj  and 
V2.  Suppose  there  is  some  arc  for  which  the  result  does  not  hold.  Either  V  is  dead 
along  the  arc  or  the  minimal  intermediate  subgraph  is  different.  Le.,  is  of  the  form 
(HV^Vj),  where  f*  1  or  j*  2.  By  the  connectedness  of  £  we  may  choose  this  "bad" 
are  so  that  it  shares  a  node  with  some  "good"  arc  whore  subgraph  is  CHVj.V^.  There 
are  four  powibflities. 


The  crucial  rale  of  extra  occurrences  in  allowing  this  type  of  propagation  is 
demonstrated  by  the  following  counter-example.  The  point  is  that  without  extra 
occurrences,  the  situation  would  look  exactly  like  the  above  one,  if  attention  is 
restricted  to  what  is  already  in  a  region.  However,  in  die  example  below,  S  is  not  live 
throughout  die  region,  in  particular,  it  is  not  live  at  die  bottom  of  the  lower  right 
node.  Thus  we  could  not  correctly  add  Vj  to  die  supply  set  of  that  node. 


In  die  scheme  we  have  proposed,  Jt2  would  have  split  and  merge  occurrences  of  V 
added,  so  that  there  would  be  some  demand  occurrence  of  V  at  the  node  pointed  to 
by  the  arc  being  compiled,  before  compilation  of  the  arc  begins. 

44  Deadly  Occurrences 

It  may  happen  that  a  variable  in  a  program  is  live  at  the  bottom  of  a  split  node, 
but  dies  out  of  one  or  more  of  the  arcs  of  the  node.  In  this  situation,  it  is  necessary 
to  know  in  which  direction  the  variable  lives  and  which  it  dies.  One  possible  way  to 
solve  this  problem  is  to  provide  some  data  structure  on  arcs  which  yields  this  kind  of 
information.  However,  rather  than  complicating  things  with  a  new  type  of  data 
structure,  we  use  extra  occurrences.  The  idea  is  to  let  the  variable  lire  on  all  the  arcs 


oat  of  the  split  node,  and  to  kill  the  variable  where  necessary  by  placing  an  extra 
occurrence  under  the  pseudo-op  KILL.  The  picture  is: 


The  KILL  V4  indicates  to  a  forward  scan  that  V  is  not  needed  something  that  would 
not  otherwise  be  known  until  foe  cut  set  of  assignments  is  detected.  This  is  one 
more  way  in  which  extra  occurrences  are  used  to  aid  later  scanning,  As  we  will  see, 
no  ffw**  in  that  V  is  live  on  foe  arc  where  it  is  really  dead. 


5.  Compiling  a  Node 

We  consider  the  compilation  of  the  smallest  regions.  i.e.,  nodes.  Such  a 

compilation  produces  a  kernel  region.  The  kernel  regions  are  the  repository  of  all 

machine  code  for  die  program.  First  we  consider  die  inpat  to  this  part  of  the 

code-generator.  We  hare  already  assumed  that  the  code-generator  works  from  a 

graph  representation  of  the  program;  in  this  chapter  we  win  need  to  make  some 

additional  amumptians  about  the  contents  of  the  nodes,  which  together  with  the 

graph  itself,  constitute  the  intermediate  form  for  the  program.  Each  node  will  have  a 

machine-independent,  hot  nevertheless  "primitive"  operation.  The  earlier  phases  of 

die  compiler  may  introduce  temporary  names,  so  that  a  statement  from  the  source 

like  U  <-  X*Y  +  Z  will  appear  in  the  intermediate  farm  as: 

T  <-  X  *  Y  or  T1  <-  X  *  Y 

U  <-  T  +  Z  T2  <-  T1  +  Z 

U  <-  T2 

The  intermediate  form  will  contain  occurrences,  so  that  if  we  had  been  following  oar 
usual  notation,  all  the  variables  above  would  be  with  subscripts.  The 

code-generator  will  commonly  appropriate  the  occurrences  of  the  intermediate  form 
far  use  in  the  machine  code  that  it  produces. 

This  chapter  may  be  viewed  as  a  further  specification  of  what  it  means  to 

"compile  the  node",  as  stated  in  the  compilation  algorithm  (page  7).  Although 

computation  cost  may  ultimately  force  a  special  means  for  turning  maximal 

flowblocks  into  regions,  we  consider  here  regions  arising  from  single  statements  of 

the  intermediate  form.  For  each  kernel  region,  we  want: 

Boundary  sets  (remember  that  the  node  is  both  an  entry  and  an  exit  node 
for  the  region). 

The  cohabitation  and  conflict  relations  among  all  occurrences  of  the 
region. 

The  "machine  code"  for  the  region  (quoted,  because  it  may  not  be  exactly 
machine  code  and  because  it  may  contain  pseudo  ops). 

Each  of  the  sections  of  this  chapter  will  consider  certain  types  of  kernel  regions, 
and  will  provide  invariant  usertioni  about  regions.  The  basis  for  the  inductive  proof 
of  these  assertions  is  that  they  hold  for  kernel  regions. 


29 

51  Extra  Occurrences 

In  Hut  section  we  will  treat  the  extra  occurences  whose  placement  was  described 
in  the  previous  chapter.  All  of  the  extra  use-occurrences  at  the  top  of  a  merge  node 
(or  bottom  of  a  split  node)  are  collected  and  placed  wider  a  single  pseudo-op.  In  this 
case,  the  intermediate  fane  simply  becomes  the  "machine  code"  when  the  node  is 
compiled.  In  the  picture  below,  the  rounded  box  represents  the  newly  constructed 
region.  It  has  one  node,  the  rectangular  box.  Since  this  node  must  have  entering  and 
exiting  arcs,  necessarily  not  yet  compiled,  it  will  be  an  entry  and  exit  node  to  die 
region,  with  the  supply  and  demand  sets  as  shown. 


We  now  state  explicitly  the  invariants  describing  precisely  when  an  occurrence  is  in  a 
boundary  set 

BND  If  a  variable  is  live  at  the  bottom  at  an  exit  node,  and  if  an  occurrence  of 
the  variable  appears  anywhere  in  the  region,  then  some  occurrence  of  the 
variable  will  be  in  the  supply  set  of  the  exit  node;  and  conversely. 

If  a  variable  is  live  at  the  top  of  an  entry  node,  and  if  an  occurrence  of 
the  variable  appears  anywhere  in  die  region,  then  some  occurrence  of  the 
variable  will  be  in  the  demand  set  of  the  entry  node;  and  conversely. 

The  correctnew  of  these  invariants  for  the  above  kernel  region  follows  because 
USE  Vy  appears  only  where  V  is  live. 

Killing  occurrences  are  similar  to  use  occurrences,  but  according  to  BND,  they 
contribute  only  to  demand  seta: 


This  is  the  most  trivial  possible  region. 


5L2  Assignments 

Recalling  that  we  take  an  optimistic  view  of  assignments  (no  code  need  be 
generated),  we  set  up  a  pseudo-op  to  hold  the  occurrences  and  construct  a 
cohabitation  arc  to  indicate  our  assumption.  Recalling  the  pseudo-op  CH8.  we  have: 


The  cost  to  break  this  arc  is  indicated  as  s  +  c-f,  where  s  and  e  are  the  space-time 
factors  for  a  move  instruction,  and  /is  the  frequency  of  the  assignment  The  means 
cf  breaking  this  cohabitation  arc  is  to  change  the  CHB  to  some  move  opcode,  thereby 
■Ating  space  to  the  code  and  time  to  the  execution. 

It  should  be  noted  that  s  and  c  are  chosen  optimistically.  For  example,  on  a 
register  machine,  they  would  be  chosen  on  the  baas  of  a  register  to  register  move 
instruction.  If  it  becomes  necessary  to  break  the  cohabitation,  it  might  then  be 
discovered  (in  the  context  of  a  by  now  larger  region)  that  one  or  both  of  Vj  and  Ut 
are  not  registers.  Thus  the  actual  cost  would  turn  out  to  be  much  larger  than 
expected.  Aa  pointed  out  in  section  3J,  this  particular  break  in  the  cohabitation 
relation  might  not  be  uaed  after  ail;  if  it  h  used,  the  cost  of  tweaking  it  would  be 
upped  in  light  of  the  new  knowledge. 

The  above  kernel  region  has  a  single  node,  with  aasociated  supply  and  demand  seta, 
just  like  the  kernel  regions  of  the  previous  section.  It  also  has  a  cohabitation  graph. 
Technically,  so  did  the  regions  of  the  previous  section,  but  those  cohabitation  graphs 
had  no  arcs,  so  we  didn't  discuss  them.  This  noc-trivial  cohabitation  graph  provides 
an  opportunity  to  introduce  die  invariant  to  be  obeyed  by  the  cohabitation  graph  of 
a  region.  Aa  with  the  BND  invariant,  this  invariant  is  formulated  with  respect  to 
what  can  be  seen  in  the  region.  We  reason  as  follows.  Suppose  we  begin  an 


execution  at  some  entry  point  to  the  region,  and  follow  it  to  an  exit  node.  The  only 


thing  which  can  influence  the  execution  is  the  values  of  the  variables  in  the  demand 
set  of  the  entry  node,  plus  any  inputs  obtained  along  the  execution  path.  The  only 
lasting  effect  of  this  execution  path  is  outputs,  and  the  values  of  the  variables  in  the 
supply  set  of  the  exit  node.  The  semantics  of  the  source  language  will  specify  what 
execution  path  will  be  taken,  given  the  values  far  variables  in  the  demand  set  and 
inputs  along  the  path,  and  will  specify  the  outputs  and  the  values  of  the  variables  at 
any  point  In  particular,  at  the  exit  node,  the  semantics  will  specify  the  values  of  the 
variables  in  the  supply  set  With  this  region-restricted  view  of  correctness,  we  arrive 
at  the  following  invariant  for  cohabitation. 

GHB  The  machine  code  for  a  region  is  correct  for  execution  within  the  region, 
provided  that  all  the  occurrences  in  one  cohabitation  clan  are  to 

tile  Infati/vn, 

We  examine  this  invariant  for  the  above  region.  The  only  demanded  variable  is  U, 
and  the  semantics  of  the  language  prescribe  that  V  and  U  have  the  same  value  after 
V  <-  U  is  executed.  If  V  and  U  occupy  the  same  location,  they  must  have  the 
value  upon  exit.  The  invariant  holds. 

S3  Computations 

It  is  here  that  the  most  interesting  cases  arise.  We  shall  consider  as  our 
fundamental  example  the  source  construct  V  +  U.  To  make  things  interesting,  assume 
that  a  machine  add  instruction  has  only  two  operands  and  always  destroys  one  of 
them.  In  the  erne  where  V  and  U  are  last  uses,  we  have  the  following  compilation  of 
T  <-  V  +  U  (recall  that  *  denotes  a  generation,  so  the  instruction  below  the 
second  operand  to  the  first): 

demand  VlfUi 
supply  Tj 

Since  Vj  is  a  last  use,  we  can  make  the  optimistic  assumption  that  Vj  and  Tf  can 
occupy  tiie  tame  location — thus  leading  to  the  cohabitation  Vj  -*  TJ.  It  ia  evident 
that  the  CHB  invariant  holds.  Since  Vj  and  Wj  are  last  uaes,  and  Tj  a  first  use.  we 


also  aee  that  the  BND  invariant  holds. 


The  rationale  for  the  0  cost  of  the  cohabitation  is  that  conceivably,  an 
inconsistency  could  be  resolved  by  interchanging  the  two  operands  of  the  +,  so  that 
U  j  cohabits  with  Tf. 

Nest,  let  us  suppose  that  neither  V  j  nor  Ut  is  a.last  use.  Then  any  simple  ADD  will 
destroy  a  needed  variable.  It  might  seem  that  a  sequence  such  as  the  following  is 
necessary: 


But,  an  some  machines,  such  as  the  PDP-10,  there  are  instructions  which  leave  a 
result  in  two  places  or  which  can  do  a  move  while  doing  same  other  operation.  Thus 
it  might  be  possible  to  have  two  copies  of  a  variable  at  this  point,  and  to  clobber  only 
one.  Since  the  strategy  here  is  always  to  be  optimistic,  in  this  case  we  could  generate 
as  code  only  a  simple  ADD.  demand  two  copies  of  a  variable,  and  set  up  a  conflict 
relation  to  indicate  what  die  problem  is. 


The  pseudo-op  USE  is  employed  to  provide  a  place  for  a  new  occurrence  of  V.  The 
costs  on  the  cohabitation  arc  should  be  0,  because  of  die  possibility  that  even  if  two 
copies  of  V  cannot  easily  be  made  available,  two  copies  of  U  can  be.  (tea  machine 
which  does  not  freely  generate  copies,  such  a  technique  is  of  course  not  worthwhile. 
On  a  three  operand  machine  such  as  the  VAX,  we  could  use  all  three  operands: 
A0D__3  Vj.Wj.Tj.  On  many  machines,  the  HOVE -ADD  sequence  will  be  the  best 
posrible  code. 

Finally,  in  the  case  that  exactly  one  of  the  variables  is  a  last  use,  we  generate  the 
following  region  (without  loss  of  generality,  assume  Vj  is  a  last  use,  Uj  is  not): 


The  cost  at  breaking  this  cohabitation  arc  depends  upon  what  might  be  possible  if 
this  optimism  doesn't  work  out — availability  of  multiple  copies,  three-operand 
instructions,  or  only  a  MOVE-ADO  sequence. 


In  tiie  last  three  kernel  regions  pictured  shore,  we  have  inserted  a  non-trivial 
conflict  relation  into  the  region,  without  saying  precisely  what  correctness  is.  The 
invariant  here  is  quite  similar  to  CHB;  it  simply  formalizes  the  notion  that  conflict 
prohibits  excessive  cohabitation. 

CON  The  machine  code  for  a  region  is  correct  for  execution  within  the  region, 
provided  that  conflicting  occurrences  are  not  assigned  to  the  same 
location. 

Implicit  in  the  invariant  is  that  different  cohabitation  classes  am  be  assigned  to  the 

same  memory  location  (especially  the  same  register),  so  long  as  they  do  not  conflict 

The  correctness  of  CON  for  all  the  kernel  regions  of  this  section  follows  from  s 

simple  rule  that  has  been  observed  in  establishing  kernel  conflict 

Whenever  an  occurence  is  both  in  the  supply  set  and  the  demand  set  of  a 
kernel  region,  it  is  in  conflict  with  all  the  generations  of  that  region. 

We  have  chosen  a  commutative  operation  such  as  ADD  to  show  how  the  basic 
scheme  allows  examination  of  the  various  ways  of  compiling  the  operation.  We 
consider  briefly  the  construct  T  <-  V  -  U,  where  the  machine  has  no  reverse 
subtract  instruction.  The  first  attempt  at  compiling  this  is  SUB  V*,  regardless  of 
whether  V  j  is  a  last  use.  The  difference  between  this  and  the  ADD  case  is  in  the 
accompanying  data  structure,  particularly  the  coat  on  the  arcs.  It  always  hurts  to 
break  Vt  -*  T*.  because  ft  cannot  be  done  with  a  simple  change  to  an  instruction. 
(Ignore  the  possibility  that  it  might  be  reasonable  to  calculate  the  negative  of  the 
derived  quantity,  and  correct  things  later,  as  in  U  -  V  GT  0.)  Hew  are  the  four 
cases,  under  the  assumption  that  the  machine  can  freely  generate  copies: 


till 


6.  Compiling  an  arc 

We  finally  come  to  the  interesting  part  of  compilation — compiling  an  arc.  The 
following  picture  represents  the  generic  situation. 

jV]  is  an  exit  node  of  the  region  Jlj 
soppiy  p\.p^.. 
arc  A  with  frequency  f 
demand  ^ j  Az**** 

jV2  is  an  entry  node  of  the  region  Rz 

la  general  Rl  may  have  other  exit  nodes,  and  Nt  may  have  other  arcs  leaving  it, 
compiled,  some  not;  dually  for  R2  and  N2.  The  occurrences  supplied  by  Nx 
demanded  by  Nj)  are  denoted  P\.  Pz  ~  (reap.  ft,  ft,  -.)•  This  does  not  mean 
}  is  the  variable  of  pt  Rather,  the  notation  i(p)  is  used.  Note  that  Jfj  may  be 
to*2. 

Our  goal  in  this  chapter  is  to  devise  the  proper  adjustments  to  die  data  structures 
(boundary  sets,  cohabitation  and  conflict  relations,  and  machine  code)  so  that  after 
tnrfwini  of  die  arc  A  all  of  the  invariants  of  the  previous  chapter  remain  true. 
Naturally,  we  inductively  assume  the  invariants  for  R  j  and  Jt2. 

This  chapter  is  an  elaboration  of  what  it  means  to  "compile  the  arc"  (compilation 
algorithm,  page  7).  The  process  of  compiling  an  arc  is  driven  by  the  members  of  the 
boundary  seta,  denoted  above  by  plt  ....  and  ft,  ft,  ~  .  The  members  of  each  of 
the  two  boundary  sets  divide  into  those  that  share  a  variable  with  an  occurrence  of 
the  other  boundary  sets— these  are  called  matehtd  occurrences— and  those  that  do 
not  In  broad  terms  we  have: 


Algorithm  6.1  Arc-compilation 

for  p,q «-  matched  occurrences  along  arc  A 
Match  P4 

for  p  *-  unmatched  occurrences  of  N\ 

Propagate  p  throughout  R2 
for  q*-  unmatched  occurrences  of  N2 
Propagate  q  throughout  Rx 

The  neat  two  sections  consider  what  it  means  to  "propagate"  an  occurrence 

throughout  a  region,  and  what  it  to  match  occurrences.  Subsequent  sections 

consider  some  derivative  problems. 

61  Propagation  of  an  Occurrence 

We  «*»»*<*■*•  me  case  in  which  the  variable  of  some  occurrence  in  the  supply  set  is 
not  the  variable  of  any  occurrence  in  the  demand  set,  or  dually.  Le.,  interchange  the 
words  "supply"  and  "demand".  For  simplicity,  we  talk  about  only  the  missing  demand 
variable  ewe,  and  do  not  continually  make  dual  remarks  for  the  missing  supply 
variable  case. 

Using  BND  inductively,  we  conclude  that  R2  has  no  occurrence  of  V.  and 
consequently,  that  Aj  *  R2.  Thus  the  new  region  will  be  R  4  RiU{A }uA2.  Then 
we  invoke  Corollary  4.1  and  deduce  that  V  is  live  throughout  R2.  The  combined 
region  R  is  now  in  danger  of  violating  BND,  since  it  now  has  an  occurrence  of  V 
(namely  Vj),  but  no  occurrences  of  V  in  the  boundary  sets  of  what  used  to  be  R2. 
These  observations  prove  that  the  following  step  maintains  the  correctness  of  BND 
for  the  variable  V. 

Algorithm  6.2  Propagation  of  an  occurrence,  step  1 
Add  Vj  to  each  boundary  set  of  A2. 

This  rule  is  the  basis  for  the  term  "propagation."  Aside  from  initial  elements  of 
boundary  sets  from  the  construction  of  kernel  regions,  this  is  the  only  way  that  these 
sets  grow. 

We  next  consider  what  we  must  do  to  maintain  the  CHB  invariant,  relative  to  the 
variable  V.  The  answer  is,  nothing  at  all.  To  see  why,  consider  an  execution  path  in 
A.  Since  Ri  *  R&  such  a  path  will  lie  entirely  within  At  or  R2  individually,  or  will 
cross  the  arc  A  exactly  once.  Since  one  of  the  original  regions  has  no  occurrence  of 
V,  there  are  no  new  requirements  relating  cohabitation  of  occurrences  of  V.  Thus,  no 


change  to  the  cohabitation  graph,  at  least  on  V’s  behalf,  is  necessary. 

The  situation  is  quite  different  when  we  consider  the  CON  invariant  To  see  why, 
let  ns  suppose  that  V  appears  in  but  not  in  J?2.  Le<  05  assume  that  there  is 
another  variable  U,  which  might  or  might  not  appear  in  Aj,  but  in  any  case  does  not 
have  a  generation  there.  An  execution  starting  in  itj,  entering  A2  (via  A ),  and 
p— *"g  through  tiie  generation  of  U  can  change  the  value  of  V.  But  since  V  is  live  at 
eat  nodes  of  A2,  the  execution  path  might  not  have  the  correct  effect  on  the  value 
of  V.  in  order  to  maintain  CON,  we  must  add  some  conflict  relations.  Let  Vj  be  the 
occurrence  of  V. 

Algorithm  6.3  Propagation  of  an  occurrence,  step  2 

Establish  conflict  between  V]  and  every  generation  of  region  R2. 

To  show  that  CON  holds  for  the  region  R  in  light  of  the  above  action,  we  follow  the 
above  reasoning  which  we  used  to  motivate  the  action,  except  we  can  now  observe 
that  no  ^M»|f  to  a  location  in  A2  can  affect  V,  so  that  the  value  of  V  as  an 
execution  path  crosses  arc  A  (inductively  correct  from  Aj)  is  the  same  value  that  it 
has  at  an  exit  node  of  R2.  The  correctness  of  CON  is  the  last  step  in  showing  the 
correctness  of  all  the  invariants,  relative  to  V,  when  Vj  is  an  unmatched  occurrence. 
Note  the  similarity  of  this  algorithm  and  the  rule  for  kernel  conflict,  page  33. 

Because  of  the  way  that  conflict  is  created  for  unmatched  occurrences,  the  set  of 
such  occurrences  is  called  a  brush  set,  the  image  being  that  these  occurrences  "brush" 
over  tiie  region,  creating  conflict.  It  is  a  simple  matter  to  go  through  R2  establishing 
conflict  between  its  generations  and  Vj.  It  is  also  simple  to  see  that  doing  this 
naively  fa  going  to  be  very  expensive  as  regions  get  large.  It  would  have  to  be  done 
for  every  unmatched  occurrence  of  the  supply  set,  so  the  number  of  steps  would  be 
proportional  to  the  product  of  the  size  of  the  brush  set  and  the  number  of 
generations  of  the  region.  Fortunately,  something  much  more  efficient  than  a  literal 
interpretation  can  be  achieved;  tiie  algorithms  and  data  structures  are  discussed  in 
sections  6.6  and  6.7. 


&2  Matching  Occurrences 

la  this  section,  we  consider  a  pair  of  matched  occurrences,  Le.,  p  and  q  with  i(p)  = 
v(f).  It  is  pomible  that  p  *  q,  because  of  a  loop  or  because  of  sideways  propagation, 
as  pictured  below  (p-VyAvi  the  arc  being  compiled). 


There  is  no  action  to  be  performed  in  this  case,  but  we  must  of  course  prove  that  the 
invariants  hold  after  inclusion  of  A  in  the  region.  Observe  that  we  must  have  = 
R2,  since  the  same  occurrence  cannot  be  in  separate  regions.  The  BND  invariant 
follows  directly  from  that  Consider  an  execution  path  in  K  *  }.  We  use 

induction  to  prove  correctness  up  to  the  first  appearance  of  A,  then  observe  that  the 
value  of  V  is  delivered  at  Ny  in  the  location  of  the  cohabitation  class  of  p,  and  is 
required  at  the  same  place  at  entry  to  N2.  Thus,  relative  to  V,  correctness  extends 
across  A,  and  we  may  repeat  this  argument  till  we  get  to  the  end  of  the  execution 
path. 

If  p  and  q  are  distinct,  we  write  p  *  Vj  and  q  =  V2.  In  this  case.  A|  and  A2  may  or 
may  not  be  the  same,  but  in  either  case,  there  is  no  need  to  change  the  boundary  sets 
inherited  from  Ij  and  R2.  The  BND  invariant  holds  by  dirtct  induction.  However, 
CHB  does  not  necessarily  hold,  because  we  must  make  sure  that  Vj  and  V2  are 
assigned  the  same  memory  location,  or  we  cannot  make  a  correctness  argument  for 
an  execution  path  crossing  arc  A.  The  purpose  of  the  following  algorithm  is  to 
formally  state  what  mast  be  done  to  the  cohabitation  graph  to  maintain  CHB. 
AlgeHtlui  (.4  Match  occurrences  (for  one  variable) 

Establish  s  cohabitation  arc  between  Vj  and  V2. 

if  there  is  an  inconsistency  than  call  the  inconsistency  resolver. 

If  we  can  perform  this  part  of  the  algorithm  without  getting  an  inconsistency,  then 
CHB  follows  for  the  usual  reason — we  can  prove  correctness  acrom  the  arc  if.  To 


prove  CON,  note  that  occurrences  which  conflict  before  the  union  will  necessarily 
conflict  after  the  union.  Thus  the  statement  of  CON  for  the  new  region  is  logically 
weaker  than  the  conjunction  of  CON  for  the  original  regions,  and  it  holds. 

If  the  above  step  results  in  inconsistency,  then  to  maintain  CHB  and  CON  we  must 
modify  code,  as  well  as  the  cohabitation  and  conflict  relations.  This  it  the  subject  of 
chapter  7. 

It  may  have  gone  that  the  remarks  of  this  section  apply  to  die  case  in 

which  die  variable  V2  is  an  extra  occurrence  under  a  KILL  pseudo-op.  It  may  well  be 
die  case  diet  V2  is  the  only  occurrence  of  V  in  Jt2.  The  fact  that  V  l  matched  V2 
wmw  that  V|  did  not  brush  over  J?2,  so  no  conflict  was  established  between  V|  and 
any  generation  of  Jt2  Further,  the  fact  that  V2  is  not  a  generation  means  that  no 
conflict  was  established  between  it  and  any  brush  occurrence  from  the  supply  set  of 
JVj.  Thus,  the  V2  is  an  innocuous  bookkeeping  device,  as  we  claimed 

earlier. 

S3  FstaMMifag  g  Cohabitation  Arc 

The  algorithm  for  matching  occurrences  in  the  previous  section  begs  the  question 
of  how  to  establish  die  cohabitation  arc  between  V|  and  V2.  There  are  two  problems: 
which  direction  does  the  arc  go,  and  what  is  the  cost  of  breaking  it? 

We  first  consider  the  question  of  direction.  From  the  way  regions  are  constructed, 
it  is  dear  that  there  h  a  V-free  undirected  path  from  A  to  Vg  and  from  A\oSz.  If 
boundary  seta  contain  merely  occurrences,  there  is  no  way  to  know  whether  A  is  in 
G(V| .Vj)  or  0<V2,V|).  However,  boundary  sets  are  disjoint  unions  of  subsets  of 
boundary  sets  of  kernel  regions,  and  it  makes  sense  to  say  that  an  occurrence  nt  a 
particular  boundary  m  was  originally  in  a  supply  set  or  originally  In  a  demand  set 
(Because  an  occurrence  can  appear  in  both  boundary  seta  of  a  kernel  region,  it  may 
have  different  origins  with  respect  to  different  boundary  sets.)  This  information  is 
simple  to  keep  track  of,  and  telb  which  direction  the  cohabitation  arc  must  go.  The 
following  picture  makes  this  dear,  and  also  shows  that  cohabitation  arcs  can  point  in 
the  direction  opposite  to  arc  A. 


Daring  the  compilation  of  X^,  the  occurrence  Vj  starts  in  the  demand  set  of  the 
bottom  node,  and  eventually  appears  in  the  supply  set  of  JVj;  dually.  V2  will  appear  in 
rite  set  of  N2.  If  we  make  the  rule  that  the  cohabitation  arc  goes  from 

"kernel  supply  set”  to  "kernel  demand  set”,  the  arc  direction  will  be  from  V2  to  V  j, 
with  the  direction  of  flow.  Put  differently,  we  observe  that  A  is  in 
G(V2,Y  j),  not  OHVi.V^.  Formalizing  this,  we  have: 

Algorithm  6i  Establish  a  cohabitation  arc 

if  V  |  was  originally  in  a  supply  set 
If  Vj-*V2  does  not  already  exist 
make  an  arc  Vj-*V2 

else  <V|  was  originally  in  a  demand  set) 
if  V  y*V|  does  not  already  exist 
make  an  arc  VjvVj 

This  guarantees  the  following  characterization  of  established  cohabitation  arcs: 

Theorem  6.1  Suppose  there  is  a  directed  V-free  path  from  Vj  to  V2.  There  is  an 
undirected  V-free  path  from  to  V2  lying  in  some  region  and  in  GC^.V^  if  and 
only  if  there  is  a  cohabitation  arc  from  V,toV2. 

Proof  (Inferring  to  the  above  picture  may  help,  except  that  the  labels  V  t  and  V2  are 
reversed.)  The  backward  implication  is  vacuous  before  compilation  begins,  because 
there  are  no  regions,  and  no  cohabitation  arcs.  Since  creation  of  a  kernel  region  does 
not  add  any  new  undirected  V-free  paths,  we  must  merely  insure  that  it  does  not 
create  any  cohabitations  between  different  occurrences  of  rim  same  variable.  This 
would  happen  only  for  an  assignment  of  the  form  V+V;  we  assume  these  have  been 
eHminated.  We  next  conridcr  the  compilation  of  an  arc  A  from  JVj  to  N2  which 


results  in  die  establishment  of  a  cohabitation  arc  from  Vj  to  V2.  In  this  case  the 
algorithm  for  establishing  cohabitation  arcs  says  V  i  was  originally  in  a  supply  set,  and 
V2  was  originally  in  a  demand  set  Now,  step  1  of  the  algorithm  for  the  propagation 
of  an  occurrence  is  the  only  way  that  occurrences  are  propagated,  so  the  only  way 
that  Vj  could  be  in  the  supply  set  of  JVj  or  the  demand  set  of  JV2  is  if  there  is  an 
undirected  V-free  path  in  the  respective  region;  dually  for  V2  and  the  node  at  the 
other  end  of  A.  The  paths  together  with  the  arc  if  constitute  a  connected  subgraph 
with  no  occurrence  of  V,  and  V  is  clearly  live  along  A.  By  Corollary  4.1,  A  and  each 
path  must  lie  in  CHVj.V^,  providing  the  required  undirected  V-free  path.  It  will  lie  in 
the  region  created  by  compiling  arc  A,  and  so  meets  all  the  requirements.  Hence,  if  a 
cohabitation  arc  exists,  the  path  exists. 

Conversely,  suppose  we  compile  an  arc  which  completes  the  first  undirected  V-free 
path  of  compiled  arcs  from  Vj  to  a  distinct  occurrence  V2,  where  the  path  lies  in 
GfVj.V^.  Then  there  must  be  an  occurrence  of  Vj  in  one  of  die  sets  attached  to  the 
entry  or  exit  node,  and  an  occurrence  of  V2  in  the  other;  Vj  will  necessarily  be  in  its 
kernel  region’s  supply  set,  and  V2  In  its  kernel  region’s  demand  set  Further,  by 
inductive  use  of  die  theorem  itself,  there  is  no  cohabitation  arc  from  V2  to  V2.  But 
this  is  precisely  the  condition  under  which  such  an  arc  is  established,  by  the  above 
algorithm.  Thus,  if  the  path  exists,  the  arc  exists.  □ 

The  other  problem  that  this  section  considers  is  obtaining  a  good  lower  bound  on 
die  cost  of  breaking  die  cohabitation.  Breaking  is  done  by  some  kind  of  move 
instruction.  The  first  thing  to  determine  is  how  often  a  move  must  be  done,  which  is 
related  to  the  part  of  the  program  in  which  die  move  can  be  placed,  namely,  the 
intermediate  graph  <X Vj.V^.  Consider  the  nodes  of  CHVj.V^  which  do  not  have  an 
occurrence  of  V.  All  area  incident  upon  these  nodes  are  in  OCVj.V^,  as  is  dear  from 
Corollary  4.1.  The  general  picture  of  ( HVj.V^  is  thus: 


To  determine  how  often  a  move  moat  occur  we  sum  the  frequencies  of  all  die  arcs 
leaving  the  Vj  node  which  are  in  GKVi.V]).  or  equivalently,  the  sum  of  the 
frequencies  of  arcs  in  OKVj.Vj)  which  point  to  the  V2  node.  This  gives  the 
transmission  frequency  from  V  j  to  V2,  denoted  /(Vj.Vj). 


Jf  die  node  of  Vj  has  a  single  outgoing  arc,  or  the  node  of  V2  has  a  single 
incoming  arc,  /(V  is  easy  to  calculate,  namely,  it  is  die  frequency  of  that  single 
arc.  Even  in  the  general  case  it  Is  not  exceedingly  expensive  to  calculate  /(V  ^Vj); 
nevertheless,  it  may  be  die  case  that  it  is  advantageous  to  use  /  the  frequency  of  the 
arc  being  compiled,  to  serve  as  a  lower  bound  for  /(Vj.V*).  This  is  justified  by  the 
following  result 

Lemma  tl  When  establishing  a  cohabitation  arc  between  Vt  and  V2,  die  frequency  / 
of  the  arc  being  compiled  is  torn  than  or  equal  to  /(Vj.Vj). 

Pruuf .  By  the  compilation  order  of  arcs,  we  know  that  /  is  a  value  along 

some  undirected  V-free  path  between  Vj  and  Clearly,  this  value  cannot 

monad  /(Vj.Vj),  riace  some  arc  touching  die  Vt  (or  node  will  be  on  the  path, 
and  any  arc  touching  this  node  cannot  have  a  frequency  greater  than  /(V^Vj).  □ 


Using  /to  approximate  /(V ltV 2)  is  in  accord  with  our  underestimation  philosophy. 
Only  experience  will  tell  if  this  results  in  a  decrease  in  compile  time,  but  we  assume  it 


Once  we  lure  a  frequency,  we  try  to  estimate  the  cost  Suppose  we  break  the 
cohabitation  with  a  angle  more  instruction.  We  would  choose  s  and  e  for  this 
instruction  optimistically,  but  realistically— if  V  j  and  V2  are  already  known  to  be  not 
in  registers,  for  example;  s  and  c  reflect  this  knowledge.  Given  s  and  e,  the  cost  is 
jr  +  C'f. 

What  this  analysis  does  not  take  into  account  is  that  it  may  not  be  necessary  to 
insert  a  brand  new  move  instruction.  Instead,  it  might  be  possible  to  achieve  the 
move  by  trick;  for  example,  on  the  MC68000  there  is  a  single  instruction  which 
will  push  any  subset  of  registers  onto  the  stack.  In  this  case,  the  s  parameter  becomes 
0  (because  no  extra  code  is  required),  and  the  c  parameter  becomes  the  incremental 
time  cost  0 f  pushing  one  more  word.  Thus,  the  cost  of  breaking  the  arc  becomes  c  f. 
If  copies  can  be  freely  generated,  it  may  be  possible  to  break  the  cohabitation  with  s 
and  e  both  0. 

This  trickiness  in  moving  quantities  presents  a  dilemma.  One  has  the  feeling  that 
very  often,  the  tricky  instruction  which  breaks  the  arc  with  the  estimated  (minimal) 
cost  is  not  going  to  he  an  option.  If  the  code-generator  over-estimates  the  cost,  h 
may  miss  a  chance  at  an  optimization.  If  it  under-estimate  the  cost,  it  may  spend  a 
lot  of  time  in  the  inconsistency  resolver  looking  futOely  for  trickiness  that  does  not 
exist  Once  a  code-generator  is  in  operation,  it  will  be  possible  to  gain  some 
experience  in  how  to  make  this  compromise.  There  is  even  the  possibility  of  being 
able  to  dynamically  adjust  its  "optimism"  in  —signing  a  cost  The  maximum  posriMe 
gain  is  the  cost  of  a  naive  fix  minus  the  cost  of  a  subtle  fix,  e&,  s  +  c-ftor  a  move 
Infraction  minus,  say,  0  for  freely  generated  copies,  or  a  different  s  +  c  /for  some 
more  expensive  instruction  variant  The  cost  of  looking  for  a  subtle  fix  will  be 
roughly  proportional  to  the  size  of  GO^.Vj).  With  experience,  we  can  learn  how 
often  optimism  pays  off,  and  how  to  estimate  the  actual  cost  of  searching.  The 
code-generator  can  adjust  its  under-estimation  so  that  the  expected  payoff  exceeds  the 
expected  cost 


6A  Inconsistency  Detection 

We  maintain  the  consistency  of  cohabitation  and  conflict  relations  when  compiling 
arcs.  There  are  two  aspects  to  this:  first,  detecting  whether  an  inconsistency  would 
arise  if  die  arc  were  compiled  without  any  modification  of  existing  code,  and  second, 
if  such  an  inconsistency  would  arise,  modifying  the  code  so  that  consistency  is 
maintain^  This  section  discuses  the  detection  problem;  the  resolution  problem  is 
described  in  the  next  chapter. 

Thus  far,  the  only  representation  that  has  been  discussed  for  the  cohabitation 
relation  is  its  (labeled  and  directed)  graph.  If  this  is  the  only  representation  of 
cohabitation,  detection  is  wry  expensive,  because  to  determine  cohabitation,  we  must 
enumerate  members  of  a  cohabitation  class  (via  the  graph  itself).  We  provide  an 
oracle  for  rapidly  determining  whether  two  occurrences  are  in  the  same  cohabitation 
class,  given  that  cohabitation  classes  are  continually  combined  by  new  cohabitation 
arcs.  We  use  the  standard  FIND/UNION  technique  for  tins  purpose  (see  [7]).  Note 
that  it  is  occasionally  necessary  to  tear  up  a  cohabitation  class  into  two  pieces,  during 
inconsistency  resolution.  The  updating  of  this  structure  when  doing  so  is 
straightforward,  but  proportional  to  the  to  of  the  class.  Since  we  assume  that 
breaking  up  large  cohabitation  classes  is  rare,  and  since  the  cost  of  doing  so  is  at  least 
proportional  to  the  to  of  the  class,  we  assume  the  expense  is  worth  it 

To  understand  the  data  structures  used  in  inconsistency  detection,  we  examine 
more  closely  the  circumstances  under  which  the  relations  change.  The  simplest  case 
occurs  in  intra-region  compilation.  Since  there  are  no  brush  occurrences  in  this  case, 
no  new  conflict  is  generated.  Awutning  inductively  that  the  region  has  no 
inconsistencies  before  the  arc  is  compiled,  the  only  way  that  one  could  arise  is 
because  of  the  addition  of  the  new  cohabitation  relations  required  when  matching 
occurrences.  If  Algorithm  62  says  to  establish  an  arc  between  occurrences  which 
already  cohabit  (a  question  for  which  we  now  know  how  to  compute  an  answer), 
there  is  no  pomible  problem.  But  if  the  two  occurrences  do  not  cohabit,  they  may 
conflict  and  what  we  want  to  know  is  whether  any  two  occurrences  of  the  respective 
cohabitation  classes  of  the  ends  of  the  arc  are  in  conflict  This  suggests  the  concept 
of  tiie  conflict  of  cohabitation  claims  or  class  conflict,  where  the  rule  is  that 
cohabitation  classes  conflict  if  any  of  the  occurrences  in  the  respective  classes  are  in 
conflict  We  shall  not  immediately  discuss  the  implementation  of  this  relation,  but 


assuming  an  oracle  for  it,  we  will  look  more  closely  at  the  interplay  between  the 
various  cohabitation  relations  we  want  to  establish.  What  we  do  not  want  to  do  is  to 
establish  a  cohabitation  which  will  have  to  be  broken  a  few  steps  later.  To  see  how 
this  can  happen,  consider  the  following  example: 

supply  set  {  • - - — {j— solid  lines  are  pre-existing  cohabitations 

i  1 

i  yf  i 

A-mr**  get  {  «^4| — • - •  dashed  lines  are  desired  cohabitations 

Assume  that  the  only  conflict  »nwig  the  six  points  is  that  indicated,  and  that  no 
other  occurrences  cohabit  with  these.  If  we  try  to  establish  the  middle  arc  first,  there 
is  no  conflict  preventing  it  If  we  next  attempt  either  of  file  other  cohabitations,  an 
inconsistency  will  arise.  If  this  second  arc  is  more  expensive  to  break  than  the  middle 
one,  then  we  would  break  the  middle  one — the  one  we  just  established.  But  if  it  is 
cheaper  to  break,  or  even  the  same,  we  would  "break**  this  cohabitation  before  it 
really  was  ever  established.  The  same  would  happen  with  the  third  arc.  Now.  assume 
all  arcs  are  equally  expensive.  If  there  is  no  way  for  the  resolution  machinery  to 
re-establish  a  cohabitation  arc  (and  remove  file  code  used  to  break  it),  then  in  the 
case  where  all  cohabitations  are  equally  expensive  to  break,  we  wind  up  with  two 
broken  arcs,  and  one  established.  But  if  we  had  waited  and  considered  all  three  arcs 
at  once,  it  would  be  dear  that  the  best  approach  is  to  break  (Le.,  not  establish)  the 
middle  arc,  and  to  establish  the  others.  Even  if  the  resolution  machinery  is  capable 
of  reconsidering  its  placement  of  moves,  it  is  more  work  than  doing  things  right  the 
first  time. 

In  order  to  gather  up  all  the  cohabitations  which  are  to  be  considered  together,  we 
look  at  the  cohabitation  classes  of  the  occurrences  in  the  boundary  sets.  Just  as  we 
defined  the  idea  of  class  conflict  by  raising  the  conflict  relation  to  cohabitation 
Clara,  we  define  dm  matching  by  raising  the  matched  relation  to  cohabitation 
dara:  that  is,  two  cohabitation  classes  are  matched  if  each  contains  an  occurrence 
such  that  the  pat  at  occurrences  is  matched  along  the  arc  being  compiled.  Then  we 
form  the  graph  of  this  relation,  calling  it  the  dm  graph,  since  its  nodes  are 
cohabitation  dara  The  picture  below  is  the  class  graph  for  the  relations  of  the 
previous  picture,  with  cohabitation  classes  endrding  their  individual  occurrences,  and 
file  lines  class  ***•♦** 


46 


The  connected  component!  of  the  elms  graph,  called  class  components,  will  be  in 
distinct  cohabitation  dames  after  compilation  of  the  arc.  (A  cohabitation  clan  with 
no  occurrence  in  a  boundary  set  will  be  a  clan  component  unto  itself.)  The  purpose 
of  this  construction  is: 

Deflnitlea.  In  a  clan  component,  a  match  inconsistency  arises  if  two  (cohabitation) 
classes  are  in  clan  conflict  in  the  original  region. 

□ 

Using  this  definition,  we  revise  the  algorithm  for  matching: 

Algorithm  6.6  Match  occurrences  (intra-region  case,  all  variables) 

liar  each  dan  component 

If  there  la  match  inconsistency  in  the  component 
Call  the  incooshtency  resolver  on  die  component 

slat 

Establish  the  cohabitation  arcs  for  the  component 

Theorem  6.2  Cohabitation  arcs  added  by  the  above  step  do  not  cause  an 
inconsistency,  and  conversely,  the  inconsistency  resolver  is  given  a  problem  only 
when  die  addition  of  die  cohabitation  arcs  would  cause  an  inconsistency. 

Proof.  To  start  the  induction,  we  use  the  fact  that  the  relations  of  the  region  are 
consistent  before  compilation  of  this  arc.  Inductively  assuming  consistency,  consider 
adjoining  all  of  the  arcs  of  a  clam  component  Then  all  of  die  old  cohabitation 
dasses  wind  up  in  the  same  new  cohabitation  class,  because  of  the  connectedness  of 
die  dam  graph  by  arcs  just  added,  and  because  of  pre-existing  cohabitation  of  die 
elements  of  old  cohabitation  classes.  Suppose  that  two  of  die  old  cohabitation  dames 
involved  in  this  merge  had  pre-existing  dam  conflict;  then  an  inconsistency  would 
arise.  Thus  the  algorithm  gives  problems  to  this  inconsistency  resolver  only  when  an 
inconsistency  would  actually  arise.  Conversely,  if  sn  inconsistency  cycle  arose,  it  has 
to  involve  conflict  of  old  cohabitation  classes  in  the  dam  component,  since  these  are 
die  only  ones  tiiat  are  merged,  and  since  no  conflict  arcs  are  added. 

□ 

Thus,  we  have  reduced  the  problem  of  inconsistency  detection  to  the  problem  of 


knowing  which  Ha— »»  are  in  conflict  before  compilation  of  the  arc  (which  we  have 
yet  to  discuss)  and  to  an  oracle  for  cohabitation,  namely  FIND.  In  the  case  that  we 
establish  an  arc  in  the  cohabitation  graph  between  previously  disjoint  cohabitation 
we  also  do  a  UNION  of  the  cohabitation  classes,  so  that  FIND  knows  about 
the  new  cohabitation  relation;  we  must  ensure  that  the  class  conflict  oracle  knows 
about  the  cohabitation  class  merge — this  is  discussed  in  section  6.6. 


We  next  consider  inter-region  compilation.  As  usual,  we  assume  that  the 
constituent  regions  have  consistent  relations;  once  the  sets  of  occurrences  of  the  two 
regions  are  disjoint  any  inconsistency  must  involve  at  least  two  of  the  new  relations 
created  when  compiling  the  arc  in  question,  and  at  least  one  of  them  must  be  a 
cohabitation  arc.  This  is  because  an  inconsistency  is  a  cycle  of  relations,  of  which 
precisely  one  is  conflict  the  rest  being  cohabitations.  This  cycle  must  involve  at  least 
one  occurrence  in  each  of  the  constituent  regions,  so  must  cross  from  one  region  to 
the  other  at  least  twice.  Moreover,  realize  that  when  compiling  an  inter-region  arc, 
the  boundary  sets  in  question  are  partitioned  into  brush  sets  and  matched  sets.  The 
following  picture  is  useful  in  visualizing  the  relations: 


boundary  set 


brush  set 


generation  set 


matched  set 

A 


matched  set 


generation  setj,  from  jjj 

7 — t — 7) 


•  •  *•) 
Iwhiet 


boundary  set 


As  in  intra-region  compilation,  we  do  not  try  to  add  one  relation  at  a  time.  Rather, 
we  look  at  cohabitation  dames  of  the  occurrences  in  the  boundary  sets  remember, 
the  regions  are  previously  disjoint  As  before,  we  form  the  clam  graph  on  the 
cohabitation  dames,  where  the  arcs  are  induced  by  the  desired  (but  not  yet 
established)  cohabitations  due  to  the  arc  being  compiled;  this  time  the  graph  is 
M-oartite.  In  each  of  the  connectad  components  (again  called  clam  components!  of 
the  dam  graph,  we  can  determine  whether  any  inconsistency  will  arise  involving  only 
new  cohabitations  by  the  same  definition  of  match  inconsistency.  The  question  of 


class  conflict  has  to  be  asked  only  of  cohabitation  classes  in  the  same  constituent 
region,  and  therein  lies  the  elegance  of  match  inconsistency  in  inter-region 
compilation — usually  a  class  component  has  at  most  one  class  in  each  region,  so  that 
there  is  no  necessity  to  ask  any  questions  of  the  class  conflict  oracle. 

There  is  only  one  remaining  way  in  which  an  inconsistency  can  arise:  its  cycle 
must  involve  new  conflict 

Definition.  In  a  class  component,  brush-generation  inconsistency  arises  if  the 
component  has  in  one  region  a  class  that  contains  a  brush  occurrence,  and  in  the 
other  region  a  class  that  contains  a  generation. 

□ 

This  leads  to  the  rest  of  the  revised  algorithm  for  matching. 

Algorithm  6.7  Match  occurrences  (inter-region  case,  all  variables) 

for  each  class  component 

if  there  is  match  or  brush-generation  inconsistency 
Call  the  inconsistency  resolver  on  the  component 

else 

Establish  the  cohabitation  arcs  for  the  component 

Theorem  6.3  During  inter-region  compilation,  the  algorithm  gives  class  components 
to  tiie  inconsistency  resolver  precisely  when  inconsistency  would  arise  by  establishing 
the  desired  relations. 

Proof.  We  have  already  seen  that  if  an  inconsistency  arises,  it  involves  either  two 
new  cohabitations  or  a  new  cohabitation  and  a  new  conflict  In  the  first  case. 
Theorem  62  states  the  desired  result  with  the  proof  differing  only  in  how  the 
induction  is  started.  Here,  the  initial  relations  for  the  (combined)  region  are  the 
unions  of  the  relations  on  the  constituent  regions.  Since  the  relations  are  consistent 
on  their  respective  regions,  and  since  the  domains  are  disjoint  this  initial  relation  is 
consistent  and  the  induction  started. 

In  the  second  case,  h  is  clear  that  an  inconsistency  cycle  having  new  conflict  from 
a  brush  occurrence  o  to  some  generation  of  the  other  region  must  go  through  at  least 
one  cohabitation  added  while  compiling  this  arc,  and  must  f^ntain  tn  occurrence  in 
each  of  the  constituent  regions.  The  cohabitation  classes  to  which  o  and  the 
generation  belong  are  thus  matched,  and  so  are  in  the  same  clam  component  This  is 
precisely  the  condition  that  there  be  a  brush-generation  inconsistency.  Conversely,  if 


there  is  brash-generation  inconsistency,  it  is  clear  that  an  inconsistency  cycle  exists, 
so  no  non-inconsistencies  are  sent  to  the  resolver, 
o 

Note  that  the  detection  of  brush-generation  inconsistency  avoids  any  use  of  the  class 
conflict  oracle,  iMjng  only  questions  about  cohabitation  and  whether  cohabitation 
classes  have  generations.  Since  cohabitation  classes  are  always  formed  by  the  union 
of  cohabitation  classes,  it  is  easy  to  keep  track  of  whether  they  have  generations. 

hi  summary,  this  section  has  reduced  the  problem  of  inconsistency  detection  to 
that  of  two  as  yet  undescribed  oracles: 

Given  two  cohabitation  classes,  are  they  in  conflict? 

Given  a  node,  enumerate  the  elements  of  its  supply  or  demand  set 
The  has  been  on  minimizing  use  of  the  class  conflict  oracle,  which  we  were 

able  to  do  particularly  well  for  inter-region  compilation. 

&5  History  Trees  and  Cohabitation  Classes 

This  section  describes  data  structures  that  are  central  to  the  implementation  of 
both  tiie  class  conflict  oracle  and  the  enumeration  of  boundary  sets.  The  history 
tree  a  a  record  of  region  merges.  The  leaves  of  this  tree  are  kernel  regions.  Each 
internal  node  corresponds  to  the  compilation  of  an  inter-region  arc;  such  nodes  have 
two  descendants,  corresponding  to  the  constituent  regions.  We  use  the  direction  of 
the  arc  being  compiled  (A)  to  distinguish  the  left  descendant  (which  A  leaves)  from 
tiie  right  descendant  (which  A  enters).  Each  node  has  a  pointer  to  its  immediate 
ancestor;  the  top  node  of  a  history  tree  will  of  course  have  a  nil  pointer  to  its 
immediate  ancestor.  Because  of  tiie  scattered  nature  of  compilation,  there  will  be  a 
forest  of  history  trees. 

Suppose  we  have  two  kernel  regions.  They  are  in  tiie  same  region  precisely  when 
they  are  in  the  same  history  tree.  Moreover,  if  they  are  in  the  same  history  tree, 
their  least  common  ancestor  corresponds  to  the  arc-compilation  which  first  put  them 
in  the  same  region.  As  we  shall  see,  the  least  common  ancestor  algorithm  plays  a 
rale  in  tiie  class  conflict  oracle.  It  becomes  natural  to  think  of  a  node  of  a  history 
tree  as  corresponding  to  a  time  in  the  compilation  process  nodes  higher  in  the  tree 
happen  after  nodes  lower  in  the  tree.  It  will  turn  out  to  be  important  for  each 


cohabitation  class  to  have  a  pointer  to  a  node  in  the  history  tree,  corresponding  to 
the  at  which  it  was  formed  (section  6.6). 

The  representation  that  we  use  for  cohabitation  classes  is  similar  to  that  of  the 
history  tree.  Cohabitation  classes  either  come  directly  from  kernel  regions  or  are  the 
rnrinw  of  other  cohabitation  classes.  In  the  implementation  of  the  class  conflict 
oracle,  it  is  necessary  to  know  whether  the  union  arose  from  an  inter-region  or 
intra-region  compilation;  a  bit  can  record  this  information.  It  is  necessary  to  know 
when  the  occurred,  that  is,  each  cohabitation  class  has  a  pointer  to  a  node  in 
the  history  tree;  this  is  called  its  formation  time.  It  is  also  necessary  to  be  able  to 
retrieve  the  cohabitation  classes  whose  union  formed  a  cohabitation  class;  thus  a 
cohabitation  ,  like  a  node  in  the  history  tree,  has  pointers  to  two  descendants.  In 
intra-region  unions,  the  order  of  the  descendants  is  irrelevant  But  in  the  inter-region 
case,  as  in  the  history  tree,  we  use  die  direction  of  the  arc  being  compiled  to 
determine  a  left  and  right  son.  Thus,  for  these  unions,  we  know  that  the  left 
sub-cohabitation  H— ■  lies  in  the  left  sub-region  of  the  corresponding  formation  time 
in  die  history  tree.  In  inter-region  compilation,  if  a  class  component  has  several 
cohabitation  classes  in  the  same  constituent  region,  these  are  gathered  up  first  by 
intra-region  nninm;  the  step  in  forming  the  cohabitation  is  one  inter-region 
i»nipn  The  rationale  for  this  becomes  clear  when  the  class  conflict  oracle  is 
discussed. 

The  previous  paragraph  presents  all  the  fields  of  the  cohabitation  class  data 
structure  that  are  necessary  for  the  class  conflict  oracle.  We  also  want  to  be  able  to 
ask  whether  two  occurrences  cohabit  For  this,  we  use  upward  pointing  arcs,  and  the 
standard  FIND/UNION  machinery. 

As  we  saw  in  the  previous  sections,  brush  sets  play  an  important  role  in  compiling 
an  arc.  It  is  convenient  to  record  the  two  brush  sets  involved  in  an  inter-region  arc 
compilation— -one  from  the  supply  set  one  from  the  demand  set — in  two  fields  in 
each  node  of  the  history  tree.  Although  brush  sets  can  be  of  arbitrary  size,  all  that  is 
recorded  is  a  pointer  (in  section  6.7  we  will  discuss  efficient  representation  of  the 
brush  sets  themselves).  Thus,  the  size  of  the  history  tree  grows  linearly  with  the  size 
of  the  program. 

In  the  next  two  sections  we  will  propose  other  fields  for  history  tree  nodes  and 


cohabitation  classes.  The  extra  space  for  these  Helds  bays  time.  The  Helds  described 
in  this  section  are  a  minimal  set 

6j6  The  Class  Conflict  Oracle 

These  are  only  two  operations  of  interest  on  the  class  conflict  relation;  we  want  to 
know  whether  two  (cohabitation)  classes  are  in  conflict  and  we  want  to  coalesce 
classes  in  it  where  the  classes  are  not  in  conflict  There  are  several  representations 
which  one  might  use  to  achieve  this.  We  discuss  several  which  have  been  tentatively 
discarded.  The  most  obvious  is  a  bit  matrix  representation,  which  has  the  advantage 
that  one  can  determine  quickly  whether  conflict  exists.  The  disadvantage  here  is  that 
space  grows  irrevocably  ss  the  square  of  the  number  of  cohabitation  classes  and  that 
coalescing  two  classes  requires  a  number  of  operations  proportional  to  the  total 
number  of  classes  in  the  class  conflict  relation.  Another  idea  is  to  me  a  direct 
implementation  of  the  graph  of  the  relation.  TUs  baa  the  disadvantage  that 
determining  conflict  (adjacency  of  two  nodea)  has  a  cost  proportional  to  the 
minimum  of  the  degrees  of  the  nodes,  and  that  the  space  cost  wffl  be  proportional  to 
the  number  of  arcs,  which  in  practice  will  probably  grow  as  the  square  of  the  number 
of  cohabitation  clawes,  and  with  a  nan-negligible  constant  of  proportionality.  This 
method  does  have  the  advantage  that  coalescing  of  nodes  can  he  done  in  constant 
time,  given  the  proper  representation  and  the  wfilingnew  to  live  with  a  few  redundant 
area.  Increasing  the  time  slightly  can  save  the  space  of  the  redundant  arcs.  If  this  is 
deemed  necessary. 

The  solution  we  propose  for  the  class  conflict  oracle  is  based  on  intimate 

knowledge  of  the  way  that  conflict  of  occurrences,  and  thus  of  cohabitation  clawes, 

arises.  As  we  said  in  the  previous  section,  the  history  tree  is  the  key  data  structure. 

What  does  the  history  tree  have  to  do  with  conflict  and  cohabitation?  Because  of  the 

compilation  process,  conflict  arises  either  in  kernel  regions  or  in  the  merging  of 

disjoint  regions — never  in  the  compOatkn  of  an  intra-region  arc.  The  way  that  the 

history  tree  ia  used  to  determine  class  conflict  has  the  following  outline: 

Find  the  least  common  ancestor  Oca)  in  the  history  tree  of  the  formation 
times  of  the  cohabitation  dame*  in  question.  If  these  classes  conflict,  it  is 
either  becaose  at  least  one  of  them  was  formed  at  the  lea,  and  the 
constituent  parts  were  previously  in  conflict,  or  because  a  conflict  arc  was 
famed  at  the  lea,  and  one  of  the  clawes  had  a  brush  occurrence  in  one  of 


the  regions  while  the  other  class  had  a  generation  occurrence  in  the  other 
region. 

We  now  give  some  specifics  on  the  implementation  of  the  class  conflict  oracle. 
Let  the  classes  be  cltc2  and  their  respective  formation  times  be  //j  Jt2,  and  their  least 
cammnn  ancestor  be  L  After  t  is  calculated,  the  Hist  case  breakdown  is  on  whether 
either  of  ft\  or  fi2  is  equal  to  /.  Consider  first  when  neither  is,  where  we  argue  as 
follows.  No  conflict  could  exist  between  el  and  c2  before  /,  since  those  classes  then 
lie  in  disjoint  regions.  On  the  other  hand,  no  conflict  will  be  added  between  and 
c2  after  t,  since  conflict  is  not  added  once  occurrences  are  in  the  same  region,  and 
m'nry  neither  cohabitation  class  expands  after  t  (use  definition  of  formation  point  and 
history  tree).  Thus,  and  c2  will  be  in  conflict  if  one  contains  a  generation,  and  if 
the  other  has  an  occurrence  in  a  brush  set  at  t.  We  have  already  seen  that  we  need 
to  keep  track  of  whether  a  cohabitation  class  has  a  generation.  The  problem  is  in 
determining  if  a  cohabitation  class  had  a  brush  occurrence  at  a  given  compilation. 
To  aid  in  this  determination,  we  propose  that  each  cohabitation  class  be  given  a  field 
which  points  to  die  node  in  the  history  tree  corresponding  to  the  last  compilation  in 
which  some  occurrence  of  the  class  was  in  a  brush  set,  called  the  last  brush  time.  The 
extra  space  required  for  this  is  only  linear.  For  cohabitation  classes  formed  in 
intra-region  compilation  this  field  is  some  reserved  value,  say  NIL,  meaning  that  no 
element  of  the  class  has  yet  been  in  a  brush  set  The  same  value  is  used  in 
inter-region  compilation  if  a  cohabitation  class  is  formed  none  of  whose  elements  is 
in  a  brush  set  Otherwise  such  classes  receive  a  value  equal  to  the  formation  time. 
Note  that  the  determination  of  the  last  brush  time,  as  well  as  the  updating  of  this 
field  of  the  proper  set  of  classes,  can  be  done  at  no  increase  in  asymptotic  cost  since 
each  of  these  classes  has  to  be  examined  in  the  process  of  inconsistency  detection 
described  earlier.  If  either  Cj  or  c2  has  a  last  brush  time  of  exactly  t,  we  know  that  it 
has  a  brush  occurrence  at  /,  so  that  if  the  other  of  Cj  or  c2  has  a  generation,  c2  and  c2 
are  definitely  in  conflict 

Note  that  a  brush  time  of  a  cohabitation  class,  if  non-NIL,  must  be  at  or  after  the 
formation  time  on  the  history  tree,  it  is  at  or  above  the  formation  time.  As  a 
by-product  of  the  least  common  ancestor  algorithm,  it  is  easy  to  tell  if  this  last  brush 
time  is  before  /,  Le.,  below.  If  so,  no  occurrence  of  it  was  brushed  at  t,  using  this,  we 
can  often  quickly  determine  teat  Cj  and  c2  are  definitely  not  in  conflict  Thus,  the 


only  situation  in  which  we  cannot  decide  is  when  q  has  a  brush  time  after  t  and  c2 
has  a  generation,  or  vice  versa,  or  both.  This  is  one  of  the  reasons  that  brush  sets  are 
recorded  in  the  history  tree. 

We  use  these  recorded  brash  sets  as  follows.  As  a  by-product  of  the  least  common 
ancestor  algorithm,  we  can  determine  which  of  q  and  are  in  the  left  and  right 
subregions  and  so  can  tell  which  of  the  two  brash  sets  we  want  to  search  for  some 
occurrence  being  in  Cj.  If  we  find  one,  and  e2  are  in  conflict;  if  not,  the  only 
possibility  is  that  had  a  generation,  and  ^  had  an  occurrence  which  was  brushed. 
This  posribility  might  have  already  been  eliminated;  if  not,  we  use  the  same 
technique  to  see  if  cfc  had  a  brash  occurrence  in  this  compilation.  This  finally 
decides  the  question  of  conflict  when  ft\  +  t*  ft2. 

We  consider  next  the  question  of  conflict  when  exactly  one  of  the  formation  times 
is  the  compilation  time,  say  ft\  =  i  *  ft2.  If  Cj  was  formed  by  an  intra-region  union, 
then  we  use  its  decomposition  into  smaller  classes  Cjj  and  c12,  and  the  fact  that  Cj 
conflicts  with  e2  iff  cn  conflicts  with  Cj  or  e^2  conflicts  with  (Note  that  in  these 
subpsoUems,  the  least  common  ancestors  can  be  no  higher  than  i,  a  fact  useful  in 
computing  them.)  Thus,  we  turn  our  attention  to  the  case  when  Cj  was  formed 
during  inter-region  compilation,  and  our  first  task  is  to  determine  if  conflict  between 
the  classes  was  created  at  this  time.  To  do  this,  we  again  use  the  known 
decomposition  of  cz  into  Cjj  and  choosing  the  numbering  so  that  cu  lies  in  the 
same  subregion  as  c2  and  Cjj  in  the  other  subregion.  To  determine  if  a  conflict  arises 
at  tiie  compilation  /,  we  apply  the  ”ft\  *  t  *  ft2  technique  to  en  and  Cj.  This  applies 
because  the  least  common  ancestor  of  and  %  is  surely  r,  and  the  formation  time 
of  neither  is  equal  to  r.  If  there  is  a  conflict  here,  then  cl  and  e2  are  in  conflict  If 
not,  the  question  is  decided  by  whether  there  is  conflict  between  cl2  and  c 2.  (Note 
that  in  this  sub-problem,  the  least  mrmnon  ancestor  is  no  high****  th»  immediate 
descendant  of  /  in  which  ^  (and  c12)  lies.)  This  disposes  of  the  case  fti  *  /  *  ft2. 

Last  «e  examine  the  esse  ■  t  *  fi2.  (The  equality  of  ftl  and  ft2  should 
probably  be  checked  before  actually  doing  the  lea  algorithm,  since  this  case  may  arise 
fairly  often.)  If  /  is  a  kernel  region,  we  determine  the  claw  conflict  of  Cj  and  ^  by 
enumerating  their  elements  and  looking  at  kernel  conflict  The  small  numbers 
involved  make  this  quite  reasonable.  Henceforward,  we  assume  that  /  is  not  a  kernel 


region.  If  either  of  C]  or  ^  arose  from  an  intra-region  anion,  we  treat  it  as  C\  was 
treated  in  the  circumstance  of  the  previous  case.  Thus  we  consider  only  the 
case  in  which  Cj  and  6%  were  both  formed  in  an  inter-region  onion  at  time  /.  This 
time  we  ase  die  decomposition  et  into  cA  and  ea,  where  cn  and  c2i  both  lie  in  the 
—m»  subregion,  and  and  lie  in  the  other.  To  determine  if  a  conflict  was 
created  at  /,  we  treat  the  pain  cn.  C&  and  c^,  c21  by  die  analysis  described  in  the 
"ftl  *  t  *  fif  case.  If  neither  results  in  known  conflict,  we  examine  the  pain  en,cfc j 
and  C12.C22  recursively  (knowing  in  each  case  that  the  least  common  ancestor  is  lower 
Hi—  1),  to  see  if  conflict  c aisted  previously.  This  decides  the  question  of  conflict 
when  the  last  case  to  be  considered. 

The  algorithm  for  class  conflict  just  described  could,  in  theory,  be  quite  expensive. 
The  guess  and  hope  is  that  in  actual  programs,  it  will  be  pragmatic,  because  it  seems 
that  one  would  arrive  fairly  quickly  at  the  case  where  ft\*  t  +  ft2,  which  involves  no 
recursion,  and  for  which  the  heuristic  using  the  last  brush  time  will  usually  yield  a 
speedy  answer  one  way  or  the  other.  If  one  is  willing  to  pay  with  space  to  buy  time, 
a  rms*¥f  improvement  is  to  turn  die  last  brush  time  field  of  a  cohabitation  class  into 
a  "brush  time  Hst,"  so  that  the  determination  of  whether  any  occurrence  of  a  class  is 
brushed  at  a  given  time  can  be  answered  more  quickly. 

An  interesting  feature  of  this  design  is  that  we  do  not  require  an  oracle  for 
conflict  Instead,  we  are  relying  on  solutions  to  the  following  as  yet  undiscussed 
problems: 

Record  a  brush  set 

Enumerate  the  elements  of  a  recorded  brush  set 
These  are  disowned  in  the  next  section. 

6.7  Representation  of  Boundary  and  Brush  Sets 

Sections  6.4  and  6.6  have  reduced  the  problem  of  inconsistency  detection  to  the 
problem  of  enumerating  elements  of  boundary  and  brush  seta,  and  of  being  able  to 
record  brush  sets.  The  solutions  to  these  problems  involve  a  common  data  structure, 
due  to  the  fact  that  boundary  sets  grow  by  the  adjunction  of  brush  sets,  and  brush 
sets  are  formed  by  boundary  sets,  minus  matched  occurrences.  A  naive 
representation  of  these  objects  would  result  in  a  crippling  space  requirement 


There  are  two  ways  in  which  a  naive  implementation  results  in  non-linear  expense. 
One  is  in  the  construction  of  a  brush  set,  and  the  other  is  in  the  propagation  of  this 
brash  set  to  all  of  the  boundary  nodes  of  the  region.  Let  us  work  on  the  latter 
problem  first,  assuming  that  we  have  a  brash  set  in  hand.  The  goal  is  to  form  the 
(known  to  be  disjoint)  union  of  this  brush  set  with  all  of  the  boundary  sets,  for  every 
boundary  node  of  the  region,  in  constant  time.  What  we  must  have  then,  is  a  common 
place  to  put  die  brush  set,  reachable  from  all  of  the  boundary  sets  of  a  region.  To 
achieve  this,  we  note  that 

A  boundary  set  may  be  represented  as  a  list  (meaning  the  union)  of  brush 
sets,  in  the  order  in  which  the  brush  sets  are  added  to  die  boundary  set 

Different  boundary  seta  may  have  a  very  long  common  tail,  thereby  saving 
much  space. 

Behold,  we  have  already  described  just  such  a  data  structure:  the  history  tree,  with 
its  brash  sets!  To  see  how  this  data  structure  is  used  for  this  purpose,  suppose  we  are 
given  a  node  for  which  we  desire  an  enumeration  of  the  dements  of  ha  supply  set 
We  first  enumerate  all  of  the  dements  which  were  in  the  supply  set  of  the  kemd 
region  which  the  node  was  compiled  into— these  can  be  done  economically  simply  by 
the  code  for  the  kemd  region.  Next,  move  up  to  the  immediate  ancestor 
in  the  history  tree.  Since  die  history  tree  also  points  downward,  we  can  determine 
whether  the  node  is  the  left  or  right  son,  and  thus,  we  know  which  of  the  two  brush 
sets  was  adjoined  to  the  boundary  set  of  the  node  during  its  first  involvement  in 
inter-region  compilation.  The  dements  of  the  appropriate  brush  set  can  be 
enumerated.  We  then  advance  from  the  present  history  node  to  its  ancestor,  again 
enumerate  the  appropriate  brash  set,  and  so  on  up  the  tree.  This  technique  ensures 
linear  overall  requirements  for  die  space  used  by  boundary  seta,  and  for  the  time 
required  for  their  enumeration,  up  to  linear  space  and  enumeration  time  requirements 
far  brash  seta  (and  modulo  null  brash  sets,  which  are  probably  very  rare).  It  is 
delightful  that  the  history  tree  plays  a  crucial  role  in  two  seemingly  unrelated  oracles: 
daw  conflict  determination,  and  boundary  set  enumeration. 


We  next  turn  to  the  problem  of  brush  sets.  Unlike  boundary  seta,  these  are  not 
contracted  as  unions  of  other  seta;  rather  they  consist  of  a  boundary  set  with  some 
specified  set  of  occurrences  removed  because  of  matching  I  have  not  been  able  to 
devise  a  technique  to  have  both  linear  space  and  linear  enumeration  time  bounds. 
However,  there  are  techniques  to  achieve  either  at  the  expense  of  the  other,  and  ways 


to  combine  the  techniques  and  dynamically  decide  upon  the  best  technique.  We  first 
.MfttHtw  «  linear  space  technique.  The  idea  here  is  to  utilize  the  fact  that  a  brash  set 
can  be  formed  by  two  other  objects:  the  "base”  boundary  set  from  which 
occurrences  are  stripped,  and  the  boundary  set  that  provides  the  matched 
occurrences.  If  we  can  recover  these  objects,  then  the  elements  of  the  brush  set  can 
be  enumerated. 

The  history  tree,  as  with  boundary  sets,  is  the  key  to  an  efficient  representation. 
tfc«ii  that  brush  sets  are  "recorded"  in  each  node  of  this  tree;  all  that  is  needed  to 
achieve  this  is  a  pointer  to  the  arc  whose  compilation  is  being  remembered.  From 
the  arc,  we  can  get  to  the  nodes  in  question,  from  which  we  may  enumerate  the 
boundary  sets  by  tracing  up  the  history  tree,  recursively  enumerating  other  brush  sets, 
until  we  reach  the  history  node  whose  brush  set  is  desired. 

It  is  dear  that  this  scheme  requires  linear  space.  The  rub  is  that  die  enumeration 
of  a  brush  set  may  require  time  not  necessarily  proportional  to  the  number  of  its 
elements.  With  the  present  representation,  we  have  to  enumerate  the  elements  of  the 
brae  boundary  set,  and  then  suppress  in  the  enumeration  of  the  brush  set  any 
occurrences  matched  in  the  opposing  occurrence  set.  The  first  problem  is  that 
determining  whether  an  occurrence  is  matched  requires  in  this  scheme  an 
enumeration  of  the  opposing  occurrence  set,  which  may  be  expensive.  The  second  is 
that  if  too  many  elements  are  excluded,  we  spend  a  lot  of  time  enumerating  not  very 
many  dementi  of  the  brush  set  To  solve  the  first  problem,  we  note  that  the 
matched  occurrences  were  detected  in  die  process  of  generating  cohabitation  arcs. 
Sfaice  this  set  must  be  computed  anyway,  and  about  die  time  brush  sets  are  formed, 
we  have  several  options.  The  first  is  that  at  any  history  node,  we  may  keep  a  list  of 
the  cohabitation  arcs  that  were  formed  because  of  the  compilation  of  its  arc.  Then, 
to  exclude  matched  occurrences,  we  merely  search  the  list  of  these  arcs,  which  is 
presumably  orach  nutter  than  the  opposing  boundary  set  Note  that  although  this 
scheme  uses  extra  space,  it  is  not  an  asymptotic  increase,  because  h  requires  only  one 
more  pointer  per  history  node,  and  one  more  pointer  per  cohabitation  arc. 

A  related  method  to  detect  matched  occurrences  is  to  utilize  the  fact  that  a  match 
between  occurrences  involves  e  single  variable.  This  set  of  variables  may  be  stored  in 
some  data  structure  which  allows  rapid  determination  of  the  question:  "h  this 


variable  in  the  set  of  matched  occurrences".  Such  a  data  structure  would  be  a  hash 
table,  or  a  sorted  list  which  could  be  searched  by  binary  division.  The  size  of  these 
seta  is  proportional  to  the  number  of  cohabitation  arcs  established  so  again  there  is  no 
asymptotic  increase  in  space  requirements.  This  or  the  previous  technique  is  useful  if 
die  set  of  eliminated  occurrences  is  small  but  the  base  boundary  set  is  large. 

The  iwehwigtM*  described  above  eliminate  the  necessity  of  having  to  enumerate 
two  boundary  sets  in  order  to  form  one  brush  set  The  problem  remains  that  of  the 
perhaps  many  dements  enumerated  from  the  base  propagation  set,  all  but  a  few  are 
mmtrhmd  This  situation  can  be  noted  while  doing  the  compilation  of  the  arc,  and  if 
too  many  dementi  of  the  base  boundary  set  are  matched,  a  direct  representation  of 
the  brush  set  say  as  a  list  of  occurrences,  may  be  recorded  in  the  history  node. 

In  each  of  the  above  cases,  we  are  expanding  space  requirements  in  the  hopes  of 
considerable  improvement  in  enumeration  time  of  brush  sets.  Precise  definitions  of 
"large”  and  "smalT  will  have  to  wait  until  we  see  the  performance  Characteristics  of 
the  compiler  on  real  program*. 

This  and  the  previous  three  sections  have  all  been  concerned  with  the  problem  of 
inconsistency  detection,  with  the  main  difficulties  arising  from  the  conflict  relation. 
We  have  taken  the  approach  of  being  very  careful  to  conserve  space,  and  the  history 
tree  has  been  the  key  to  this,  since  it  is  used  in  three  separate  but  related  ways. 
Although  prediction  of  the  behavior  of  a  large  program  is  difficult,  it  would  appear 
that  the  representation  of  conflict  and  algorithms  over  it  are  the  most  worrisome 
areas  of  the  efficiency  of  this  compiler.  But  having  the  flexibility  that  detailed 
conflict  information  allows  is  one  of  the  keys  to  this  optimization  strategy. 


7.  Inconsistency  Resolution 


7.1  Overview 

We  mv  in  the  previous  chapter  how  to  detect  an  inconsistency.  Here  we  shall 
stady  the  problem  of  modifying  die  code  so  that  it  is  once  again  consistent  The 
means  of  doing  this  is  usually  the  insertion  of  one  or  more  instructions  in  the  code. 
Such  a  move  may  replace  what  was  previously  a  CHB  (cohabit)  pseudo-op:  we  may 
change  CHB  VjvU|  to  HOVE  VffUj.  This  corresponds  to  breaking  a  cohabitation  arc 
due  to  an  assignment  We  may  also  break  a  cohabitation  arc  that  corresponds  to 
matching  two  occurrences  V  j-»V2.  In  this  case,  the  move  has  die  form  HOVE  V^.V^ 

In  this  work  we  shall  discuss  only  those  resolution  techniques  that  rely  on  moving 
data,  whether  by  actual  insertion  of  move  instructions  or  by  modifying  other 
instructions  so  that  they  do  moves  implicitly.  Other  techniques  of  inconsistency 
resolution  are  available.  For  trample,  one  might  rearrange  arguments  of  an 
instruction:  changing  ADD  V",U|  to  AOD  Uf.Vj  may  be  a  useful  way  to  resolve  an 
Other  in  fj«t  inpM.  reordering  to  reduce 

conflict,  and  duplication  of  code  (especially  small  subroutines)  to  reduce 
cohabitation 

Whenever  the  code  is  modified,  in  particular  by  insertion  of  a  move  instruction, 
tiie  idea  is  to  make  all  die  data  structures  appear  as  if  die  code  had  been  in  that  form 
all  along.  Thus,  a  move  instruction  becomes  a  kernel  region  (or  part  of  one),  and 
typically  its  source  is  a  last  use,  while  its  destination  is  a  generation  and  a  first  use. 
We  then  look  at  ah  of  the  invariants  concerning  the  compiler’s  data  strucure,  and 
make  sue  they  remain  true,  modifying  the  data  structure  if  necessary.  This  would 
include  perhaps  adding  new  split  and  merge  occurrences,  forming  new  cohabitations 
with  any  new  occurrences  added,  listing  these  from  die  proper  point  in  the  history  if 
this  is  being  done,  updating  cohabitation  classes  to  have  the  proper  set  of  members 
and  to  have  die  proper  "has  generation"  flag — remember,  a  move  may  add  a 
generation.  Note  that  conflict  is  represented  through  the  history  tree,  so  that  the 
updating  of  conflict  information  is  almost  automatic  (beware  "last  brush  time").  In 
fact,  this  ease  of  updating  is  one  of  the  many  advantages  in  the  history  tree 
implementation  of  conflict  Any  other  representation  seems  quite  awkward  to  revise 


during  resolution. 


A  farther  ferae  in  the  placement  of  more  instructions  is  that  it  may  occasionally 
be  desirable  to  place  a  move  outride  the  region  in  which  the  inconsistency  has 
occurred.  What  we  do  in  this  case  is  to  create  a  new  kernel  region  or  place  the  move 
instruction  in  an  existing  region  different  from  the  one  which  has  caused  the 
inconsistency.  This  causes  no  difficulty — eventually,  everything  is  pieced  together. 

In  summary,  breaking  a  cohabitation  arc  means  modifying  the  program,  typically 
by  adding  move  instructions,  and  then  insuring  that  all  of  the  invariants  regarding 
extra  occurrences,  cohabitation,  and  conflict  are  correct  Thus,  compilation  of  any 
arc  is  oblivious  to  whether  there  was  previous  inconsistency  resolution. 

7  J  Difficulties 

The  above  sketch  gives  us  faith  that  if  we  can  only  decide  where  to  place  move 
instructions,  we  will  be  able  to  do  so  without  greatly  disrupting  the  compilation 
strategy  proposed  in  earlier  chapters.  The  real  problem  is  going  to  be  in  choosing 
where  to  place  the  move  instructions,  and  how  well  we  do  this  governs  the  quality  of 
die  code  we  generate. 

One  of  the  most  difficult  facets  of  the  resolution  problem  is  that  mere  knowledge 
of  the  inconsistent  cohabitation  and  conflict  relations  does  not  necessarily  tell  us  how 
to  resolve.  Consider  the  following  (wrong)  approach.  View  all  the  desired 
cohabitation  arcs  ss  in  place,  so  that  we  have  a  cohabitation  class  with  internal 
conflict  Then,  remove  some  subset  of  arcs,  breaking  the  large  class  into  two  or  more 
pieces,  each  of  which  is  internally  free  of  conflict  Modify  the  code  by  placing  move 
instructions  between  the  occurrences  of  either  end  of  a  removed  cohabitation  arc. 

The  reason  that  this  approach  does  not  work  is  that  move  instructions  introduce 
new  generations  into  die  program,  those  new  generations  introduce  new  oonflet.  and 
part  of  th is  new  conflict  can  in  fact  he  internal  to  the  smaller  pieces  of  die  original 
cohabitation  class,  pieces  which  were  hoped  to  be  free  of  internal  conflict  As  an 
example  of  what  happens,  we  consider  what  happens  in  a  conditional  assignment  of  U 


flowgraph 


cohabitation  graph 

Yi  y» 


Suppose  that  Vj  and  Uj  are  in  class  conflict  Le.,  there  are  occurrences  ol  and  02 
which  are  in  conflict  and  paths  from  these  occurrences  to  Vi  and  U». 


If  we  look  only  at  the  above  relations,  it  would  seem  that  we  can  resolve  the 
lirnnsisirnry  by  breaking  the  cohabitation  V  3.  However,  looking  at  the  program, 
this  is  ridicoloos  on  the  face  of  it  Farther,  if  we  actually  install  the  move,  we  have 
the  relations 


The  conflict  between  Vj  and  U3  is  because  Vj  is  the  destination  of  the  move,  and  is 
tins  a  generation.  The  conflict  might  also  have  been  between  Vj  and  Ut.  Hi  either 
case,  the  inconsistency  remains,  via  the  path: 

V|-,V3«.V2«.W2-U3  (or  -Uj) 

On  the  other  hand,  choosing  other  arcs  to  break  results  in  a  perfectly  acceptable  fix 


to  Hie  inconsistency.  For  example,  breaking  U^V2  obviously  will  work,  since  it 
corresponds  to  a  literal  implementation  of  the  assignment 

It  is  clear  that  in  order  to  resolve  inconsistencies,  we  most  develop  a  technique 
which  is  able  to  anticipate  the  effect  of  Hie  extra  generations  that  are  added  as  a 
consequence  of  inserting  moves. 

7 3  Refinement  of  Partitions 

The  first  step  in  resolving  an  inconsistency  is  to  obtain  an  object  which  relates  the 
various  occurrences  in  a  class  component  to  their  relative  appearance  in  the 
fiowgraph  (assume  desired  cohabitation  arcs  are  in  place,  throughout  this  chapter). 
For  in  the  example  in  the  previous  section,  it  is  important  that  V j  and  Uj 

are  in  the  same  node,  as  are  V3  and  U3.  etc.  Another  way  to  see  why  this  is 
important  is  to  consider  what  it  means  to  have  an  occurrence  of  one  variable,  say  X  j, 
in  a  mfcwMi  intermediate  subgraph  of  another  variable,  say  (KVj.Y^.  If  we  decide 
to  break  Y^Yj,  it  may  make  considerable  difference  whether  the  move  is  placed 
"above"  or  "below"  X  j.  It  is  fids  observation  which  leads  to  the  idea  of  the  common 
refinement  of  die  intermediate  subgraphs  of  a  class  component  To  introduce  die 
construction  of  this  section,  we  first  consider  some  relations  of  equivalence  relations. 

DeflnitlML  Let  and  ~2  be  equivalence  relations  over  a  set  £  We  say  that  ~|  is  a 
refinement  of  ~2  when 

a  ~i  h=>  e  ~2  b,  for  all  aJrtS 

An  equivalence  relation  ~  is  a  ammo*  refinement  of  /  *  1A-.  when  it  is  a 
refinement  of  each,  and  a  coarsest  common  refinement  ~  has  the  additional  property 
that  any  common  refinement  of  each  is  also  a  refinement  of  ~. 

□ 

It  is  a  rimple  matter  to  show  that  a  coarsest  common  refinement  exists  and  is  unique. 
In  fact,  it  is  given  by: 

e~  b  «•  a~j  b  for  all  / 

We  now  relate  refinement  of  arbitrary  equivalence  relations  to  the  problem  at 
hand.  We  first  need  a  notation  for  the  part  of  a  cohabitation  graph  concerned  with 
a  particular  variable. 


Definition.  Let  H  be  any  cohabitation  graph  constructed  during  compilation  (we 
specifically  allow  H  to  have  internal  conflict  doe  to  presence  of  desired 
cohabitations).  Let  V  be  the  variable  of  some  occurrence  of  H.  The  restriction  of  H 
to  V.  By  is  the  set  of  all  occurrences  (nodes)  of  H  whose  rariable  is  V.  together  with 
all  arcs  of  if  connecting  two  such  occurrences.  (Note  that  By  may  be  disconnected.) 
□ 

Because  of  the  way  that  extra  occurrences  are  added,  any  By  subtends  a  certain 
subgraph  of  the  flow  graph,  and  induces  a  certain  partitioning  of  this  subgraph, 
namely  that  given  by  Theorem  4.1.  We  now  characterize  the  relevant  properties  of 
the  relationship  between  By  and  the  flowgraph  by  abstracting  the  V-free  terminology. 
Let  if  be  a  graph  (think  By),  and  let  a  be  a  map  from  the  nodes  of  H  to  the 
flowgraph  (think  of  the  function  that  takes  an  occurrence  to  the  node  in  which  it 
appears).  Then  much  of  the  terminology  that  has  to  do  with  a  variable  V  can  be 
restated  in  terms  of  the  "Image  of  H  under  a",  which  we  abbreviate  a(2f).  The 
following  definitions  are  the  major  ones  of  interest  When  if  is  in  fact  the 
cohabitation  graph  By,  these  definitions  correspond  closely  with  earlier  ones,  where 
"V"  is  changed  to  "J ST.  Urn  major  difference  is  that  it  is  H  which  controls  the 
liveness,  and  not  the  ordinary  flow  rules  -— we  are  interested  only  in  the  part  of  the 
program  "seen"  by  a  particular  cohabitation  graph. 

Definition  An  B-frte  path  (in  the  flowgraph)  is  one  which  has  no  element  of  a(if), 
except  possibly  for  the  first  and  last  nodes.  If  p* q  in  H.  the  B-frte  subgraph  (of  the 
arc  or  pair  of  nodes),  denoted  Oip.q),  is  the  set  of  all  if-free  paths  from  r(p)  to  »($). 
(The  notation  ”Q(p,qY  suppresses  any  mention  of  H  and  t,  but  this  is  clear  from 
context,  since  p  and  q  must  be  nodes  of  H  and  r  is  fixed.) 

We  say  that  H  is  live  at  an  arc  or  a  node  if  that  arc  or  node  is  in  some  H- free 
subgraph.  Finally,  if  is  a  merge-split  partition  (or  ms-partition)  when: 

All  of  its  iff ree  subgraphs  have  arcs. 

27-free  subgraphs  of  distinct  arcs  of  H  intersect  only  when  the  arcs  touch  a 

common  node  n,  and  then  intersect  only  at  N. 

If  H  is  live  at  a  non  if-node  N,  it  is  live  at  all  arcs  touching  N. 

(The  "partition”  is  of  the  arcs  of  the  live  region  of  it  not  of  the  entire  flowgraph.) 

□ 

The  following  result  is  the  analogue  to  Lemmas  4.1  and  AX 


Lemma  7.1  Let  if  be  an  ms-partition,  and  let  p  ■*  q  in  H.  Then  P  dominates  and  Q 
back-dominates  all  arcs  and  internal  nodes  of  G (p,q). 

Proof.  Also  analogous  to  lemmas  4.1  and  42.  □ 

The  purpose  of  all  this  machinery  is  not  merely  to  duplicate  what  we  have  done 
for  variables.  Given  a  cohabitation  graph  H  involving  several  variables,  we  want  a 
single  graph  H,  which  somehow  works  for  all  the  variables  in  H  "at  once".  To  do 
this,  we  need  the  follwoing  notion. 

Definition.  Let  if},if2  be  ms-partitions  of  die  flowgraph.  We  say  that  if}  is  a 
rt fitment  of  H2  when: 

The  live  region  of  Hx  includes  the  live  region  of  H2. 

If  in  ff|,  and  1%  is  live  at  some  arc  of  G\{p,q),  then  all  of  <7}(p,q)  lies 
in  the  same  i^-free  subgraph. 

□ 

The  relation  Is  a  refinement  of"  is  a  quasi-ordering  (lacking  only  anti-symmetry), 
analogous  to  the  identically  named  relation  of  partitions.  We  can  define  "common 
refinement"  and  coarsest  common  refinement  (hereafter  abbreviated  ccr)  in  the  same 
way.  Standard  lattice  theory  results  yield  the  uniqueness  of  coarsest  common 
refinements  (up  to  equivalence  in  the  quart-ordering),  and  reduce  the  n- ary  existence 
question  to  one  of  binary  existence.  Before  giving  the  details,  we  provide  a  rough 
picture  of  a  ccr  of  ifj  and  H2r  The  live  region  of  17  is  the  union  erf  the  live  regions 
of  Hx  and  if*.  The  nodes  of  if  are  the  union  of  the  nodes  of  ifj  and  H2.  together 
with  "ff-merge"  and  "ifsplit"  nodes  which  have  to  be  added.  The  arcs  of  H  are 
obtained  in  a  natural  way  from  those  erf  Hi  and  if2.  We  begin  the  formal 
construction  of  a  ccr  with  the  following  algorithm. 

Algorithm  7.1  Completed  node  set  of  if},  H2. 

Initially,  the  completed  node  set  contains  V}(if})  u  *2(H£. 

IS  Ni  and  172  are  in  the  node  set  and  if  there  are  two  forward  or  two 
backward  paths  intersecting  only  at  the  node  17,  where  the  paths  ire 
if} -free  and  if2-free  and  lie  in  the  union  of  the  live  regions  of  if}  and  H2, 
then  adjoin  17  to  the  completed  node  set 

This  corresponds  exactly  to  the  construction  for  extra  occurrences  at  V -merge  and 
V -split  occurrences  (see  section  42).  We  now  characterize  the  ms-partitioos  we  are 
interested  in. 


Definition.  Let  27j  and  H2  be  ms-partitions.  A  graph  H  and  a  map  v  is  a  common 
ms-partition  of  and  H2  when: 

r(H)  is  the  completed  node  set  of  J5fj  and  H2. 

If  and  «2  nodes  of  H,  there  is  an  arc  n^n^  precisely  when  there  is 
an  ff-free  path  from  K*i)  to  v(n£  where  Hi  or  H2  is  at  every  arc  on 
the  path. 

□ 

Theorem  7.1  Let  and  Hi  be  ms-partitions.  A  common  ms-partition  of  H\  and  H2 
is  an  ms-partition  and  is  a  ccr  of  Ht  and  f^.  Conversely,  any  ccr  of  Hx  and  H2  is  a 
common  ms-partition. 

Proof.  That  if  is  an  ms-partition  is  essentially  shown  in  Theorem  4.3,  because  the 
construction  of  a  common  node  set  for  H  is  exactly  what  was  done  when  adding 
extra  occurrences  of  the  variable  V.  Note  that  lemma  7.1  plays  the  role  of  lemmas 
4.1  and  4 X 

It  is  also  not  difficult  to  show  that  H  is  a  refinement  of  Hv  Let  H{  be  live  on  the 
arc  A,  so  that  A  is  in  G<>,#)  where  p+ f  in  Ht  Let  Nj  be  the  last  node  in  the  image  H 
appearing  in  this  path  before  A,  and  let  Nk  be  the  first  such  node  after  A.  Since  this 
petit  lies  entirely  in  Qfp.q),  H{  is  live  at  every  arc  of  it,  so  that  by  definition  of 
<*wnmn»i  ms-partition,  »j*nk  in  H,  so  that  H  is  live  at  A.  Thus,  the  live  region  of  H 
inHnA-  the  live  region  of  H/,  the  first  requirement  of  refinement  The  second 
requirement  is  the  content  of  Corollary  4.3,  so  that  H  is  indeed  a  common  refinement 
of  ffA 

What  remains  is  to  show  that  H  is  as  coarse  as  possible.  Let  Hq  be  a  common 
refinement  of  the  Hf.  Since  its  live  region  must  contain  the  union  of  the  live  region 
of  which  is  exactly  the  live  region  of  H,  proving  the  first  property  required  of 
sharing  that  Hq  is  a  refinement  of  if  Let  in  Hq.  We  claim  that  Qip.q)  has  no 
internal  nodes  in  the  cornmnn  node  set  Thus,  if  H  is  live  anywhere  in  it,  all  of 
Q(j>4)  lies  in  the  same  H-ftee  subgraph,  essentially  by  Corollary  4.3.  We  use  an 
inductive  proof,  following  the  inductive  construction  of  the  common  node  set  To 
start  the  induction  observe  that  G (p,q)  is  H-t fee,  since  Hq  is  a  refinement  of  Hi  and 
Hi.  Assuming  the  truth  of  the  claim  inductively,  consider  paths  from  nodes  JVj  and 
JV2  in  the  node  set  to  a  node  N  internal  to  Oip.q).  Since  jVj  and  N2  are  outside 


65 


G<m).  the  paths  most  go  through  p  (if  they  are  forward)  or  through  q  (if  they  are 
backward);  this  by  Lemma  1  and  the  fact  that  is  an  ms-partition.  The  paths 
thus  intersect  at  a  point  other  than  N,  so  that  X  cannot  be  added  to  the  common 
node  set  at  this  point  This  completes  the  proof  of  the  claim,  and  thus  of  the 
Theorem 
□ 

A  few  remarks  are  in  order  concerning  the  relationship  of  this  result  and  the  extra 
occurrences  discussed  in  Chapter  4.  The  initial  motivation  given  for  extra 
occurrences  was  to  aid  in  finding  the  proper  place  to  put  moves,  and  the  V -merge  and 
V -split  nodes  might  have  seemed  to  be  an  ad  hoc  construction.  We  can  now  see  that 
they  arise  in  a  completely  natural  way.  The  notion  of  a  V-free  subgraph  is  an  almost 
inevitable  way  to  formalise  the  "data  flow"  of  a  variable  from  one  use  to  a  next  use. 
The  definition  of  ms-partition  formalizes  the  idea  of  partitioning  die  live  region  of  a 
variable  using  V-free  subgraphs,  and  the  definition  of  refinement  of  ms-partitions 
generalizes  the  usual  notion  of  refinement  to  the  situation  in  which  the  domains  of 
the  relation  may  overlap,  but  are  not  necessarily  equaL  If  we  take  each  V-free 
subgraph  individually,  and  consider  aU  other  occurrences  of  V  to  be  isolated,  we  have 
an  ms-partitioa  whose  live  region  is  just  that  one  V-free  subgraph.  If  we  want  to 
divide  up  die  entire  live  region  of  V,  the  coarsest  common  refinement  of  all  such 
ms-partitions  is  die  only  mathematically  reasonable  thing  to  do.  This  forces  on  us 
the  V -merge  node  and  V -split  node  construction  of  Chapter  4. 

In  this  chapter,  our  motivations  are  of  course  different  Here  we  want  to  relate 
the  partitions  of  the  live  regions  of  several  variables  to  common  parts  of  the 
fiowgraph.  We  can  now  define  to  achieve  the  effect  we  wanted. 

Definition  Given  a  cohabitation  graph  H,  the  refinement  if.  of  Hit  defined  to  be  the 
coarsest  common  refinement  of  Jfy,  where  V  ranges  over  all  die  variables  with 
occurrences  in  if. 

□ 

Note  that  the  if  .-free  subgraphs  are  regions  in  which  we  have  free  choice  regarding 
die  placement  of  moves  Involving  variables  of  occurrences  of  cohabitation  graph  if. 
The  problem  of  deciding  where  to  place  moves  factors  into  two  problems— a  certain 
optimization  problem  on  Hm  which  will  yield  a  choice  of  which  variables  to  move  on 


which  arcs,  and  then  a  code  nw<ifi^tir>fi  problem — given  fiat  a  variable  is  to  be 
moved  on  an  arc  of  Hm  how  do  we  best  place  code  to  do  it  in  the  Hr  free  subgraph 
subtended  by  the  arc.  . 

7j4  Maximal  Cohabitation  Classes 

Conflict  often  arises  as  a  global  phenomenon — a  variable  must  retain  its  value 
from  an  occurrence  Vj  to  another  occurrence  V2,  but  there  is  an  intervening 
generation  Ut  of  another  variable.  Thus  far  our  representation  of  conflict  has 
retained  this  non-local  feel  The  refinement  H»  of  a  cohabitation  class  makes  it 
poarihle  to  represent  conflict  so  that  all  conflict  looks  like  kernel  conflict  This  is  a 
first  step  in  seeing  how  to  resolve  an  inconsistency  of  H. 

Suppose  that  to  each  node  of  Hm  we  attach  an  occurrence  of  all  the  variables  of  H 
which  are  live  at  that  point  where  we  use  the  occurrences  already  on  file  line  of 
code,  if  there  are  any.  and  make  new  ones  for  other  variables.  Extending  the  rule  far 
kernel  conflict  (page  33)  to  these  new  occurrences,  any  conflict  within  H  will  appear 
m  kernel  conflict  by  the  construction  of  if*.  This  kernel  conflict  is  die  seed  of  what 
we  call  local  conflict,  where  the  term  is  chosen  became  it  can  be  seen  by  looking  only 
at  die  node  in  question. 

In  It  is  convenient  to  imagine  that 

whenever  x^x^  ^  «*ch  non-last  occurrence  of  Xj  has  a  cohabitation  arc  to  some 
non-fint  occurrence  of  a^,  where  this  cohabitation  arc  always  connects  occurrences 
of  the  same  variable.  The  usual  situation  is  that  a  variable  will  have  only  one 
occurrence  among  non-first  or  non-last  uses,  so  that  the  arc  is  redundant  The  first 
application  of  these  (perhaps  imaginary)  cohabitation  arcs  is  in  the  following  result 
which  tells  how  local  conflict  arises. 

I——  7.2  Let  x1-»x2  in  H»  and  suppom  we  have  occurrences  V*  U,  at  I «  1  and 
2.  where  Vj  and  U1  connect  (rmpectivciy)  to  V2  and  U2.  If  it  is  somehow  known  that 
V|  and  Uj  cannot  be  in  die  same  cohabitation  dam.  we  can  also  conclude  that  V2  and 
U2  cannot  be  in  die  same  cohabitation  class,  if  die  only  change  to  the  program  is  the 
insertion  of  move  instructions. 

Preef.  Suppose  V2  and  U2  are  in  the  same  cohabitation  daw,  but  that  V t  and  Ut  are 
not  Then  Gfxi.x^,  which  had  no  occurrences  of  V  or  U  at  the  time  of  the 


construction  of  the  cohabitation  .  most  have  been  modified  with  one  or  more 
instructions  which  moved  one  or  both  of  V  and  U  into  a  common  {dace.  This 
/»h«wy  the  valoes  at  one  at  the  variables,  violating  program  semantics.  In  terms  of 
nnj  conflict,  the  destination  of  the  instrnction(s)  most  have  been  a 
generation  of  one  at  the  variables,  and  so  is  in  conflict  with  the  other  variable.  Bat 
if  V2,W2  cohabit  this  is  again  an  inconsistency. 

□ 

Delate  many  of  the  results  and  techniques  of  this  paper,  this  lemma  is  decidedly 
non-dual.  Formally,  this  might  be  viewed  as  being  a  consequence  of  the  fact  that  in 
a  move  instruction,  the  source,  Le^  last  use,  is  not  a  generation,  bat  the  destination. 
Le..  first  use,  is  a  generation.  At  a  more  philosophical  level  the  non-duality  arises 
because  entropy  always  increases;  in  programs,  this  happens  when  a  memory  location 
is  clobbered. 

Using  the  seeds  of  local  conflict  and  Lemma  7  X  we  can  "grow"  local  conflict  At 

node  of  27*  the  algorithm  below  partitions  its  occurrences  into  what  we  call 
maximal  cohabitation  daxm,  at  msec's.  These  are  maximal  in  the  sense  that  no 
matter  how  move  instructions  are  inserted  to  remove  inconaistencies,  the  final 
cohabitation  classes,  restricted  to  any  node,  will  be  contained  in  a  maximal 
cohabitation  dass  at  that  node.  Initially,  each  generation  is  m  a  msec  whose  only 
other  occurrences  are  in  kernel  cohabitation  with  it  All  other  occurrences  are 
placed  in  a  single  mice.  Then  for  any  arc  ity** z  of  if*  we  can  obtain  a  new 
mace-set  at  ^  Cram  the  one  at  aa  follows: 

Algorithm  7 JX  Grow  a  mxcc-set 

GKO  Initialize  the  derived  mxcc-set  to  be  the  mxcc-set  of  nv 

GK1  If  an  occurrence  dies  out  along  *j-*«2,  remove  it  from  its  mice. 

GK2  Replace  each  occurrence  in  the  derived  mxcc-set  by  the  one  of  to 
which  it  is  connected  by  the  cohabitation  arc  along 

GR3  The  only  occurrences  of  ^  not  presently  in  the  derived  mxcc-set  are  first 
uses  st  *2-  17  one  of  these  occurrences  "kernel  cohabits"  with  some 
noo-fint  use,  put  it  in  the  same  derived  mxcc  as  the  one  with  which  it 
cohabits  (this  always  applies  if  the  first  use  is  not  a  generation,  in  which 
case  it  is  Oj  in  CHB  ox,o£.  The  remaining  occurrences,  all  generations, 
are  put  in  mice’s  to  their  kernel  cohabitation  relation. 

GX4  Replace  the  mxcc-set  at  *2  by  the  coarsest  common  refinement  (of  simple 
partitions)  of  the  current  macc-set  at  a*.  and  the  derived  mice  set 


This  algorithm  applies  only  to  one  arc.  and  by  itself  is  not  an  algorithm  for  getting 
mxcc-sets  everythere.  However,  it  fits  into  the  general  class  of  "weak  interpreter" 
technique*,  the  theory  of  which  guarantees  a  "strong  as  possible"  global  set  of 
mzcoseti  that  is  consistent  with  the  seeds  and  the  growth  rale.  To  calculate  this 
global  set,  we  just  continue  applying  the  above  rale  till  things  settle  down. 
Algorithmic  aspects  will  not  be  discussed  further  here.  We  do,  however,  prove  that 
such  a  global  set  of  mzcc-sets  tells  us  what  we  want  to  know  about  conflict 

Theorem  7.2  Suppose  we  label  the  nodes  of  H*  with  mzcc  sets,  beginning  with  the 

seed  mzcc-sets.  and  then  by  repeatedly  growing  new  mzcc-sets.  Suppose  we  eliminate 

inconsistencies  by  the  insertion  of  move  instructions.  Then: 

If  0j  and  02  are  both  non-last  occurrences  at  a  node  of  H,  and  are  in 
different  mice's  before  the  modification,  then  they  will  not  cohabit  after 
tfc»  modification 

Proof.  This  is  true  of  seed  mzcc-sets  because  at  least  one  of  the  generations  will  be 
or  will  kernel  cohabit  with  a  generation,  and  by  assumption,  the  other  will  not  be  a 
last  use.  Thus,  they  will  be  in  kernel  conflict  lemma  12  says  that  the  same 
propertywill  hold  of  the  derived  msec-set  constructed  in  GK.1-GK.3. 

All  that  remains  is  to  show  that  if  the  result  holds  for  two  mzcc-sets  at  a  node  of 
Hm  it  holds  at  their  coarsest  common  refinement  Let  0t  and  0^  be  in  different  mzcc 
sets.  By  the  remark  in  the  previous  section  about  common  partitions  of  sets,  we 
know  that  ol  and  02  are  in  different  mice's  in  at  least  one  of  die  two  mzcc-sets. 
That  msco-set  teDs  us  that  0}  and  0^  cannot  cohabit  in  a  program  changed  only  by 
the  addition  of  move  instructions.  This  proves  the  desired  property  of  the  coarsest 
common  refinement 

Incidentally,  we  can  also  observe  that  no  information  is  lost  in  this  step,  te..  the 
conflict  of  both  mzcc-sets  is  reflected  in  coarsest  common  refinement  Suppose  0j 
and  0J2  are  in  the  same  sucCs  of  the  coarsest  common  refinement  Then  they  are  in 
die  same  mice's  in  both  of  the  original  mzcc-sets,  so  no  stronger  statement  was 
known  previous  to  the  replacement  of  the  current  mzcc-set  by  die  new  one.  □ 

We  now  give  an  example  of  how  this  works.  Return  to  the  r**ruU*ir>n»i  exchange 
example  of  section  3.4,  page  16.  We  first  give  H,  with  the  initial  mxcc-setr. 


r 


69 


,0<oMYo> 
**.{ZlfX,.Y4> 
cJCCg.YAI 

I  ^W*x4> 
^ow 


X4,Y4  and  Z3  are  occurrences 
added  because  the  Tariables  are  live 
at  their  respective  nodes 


Note  that  at  point  4  we  are  aasnming  that  we  somehow  know  that  X0  cannot  cohabit 
with  Yq.  Suppose  we  derive  a  macc-set  along  a+b.  By  GB.3,  the  derived  mace-set  will 
be  (Z  3JC  j}.{Y4}  which  is  the  coarsest  common  refinement  of  itself  with  the  current 
macc-set  at  b,  and  so  replaces  it,  by  GK4.  Then  consider  b+c.  At  die  end  of  GB.1, 
the  derived  macoset  is  {ZiJ.JY^,  while  at  the  end  of  G13,  it  has  become  {X2.Yjj.fZ3j, 
which  becomes  die  macc-set  at  c  Neat,  consider  o*d.  We  see  that  the  new  macc-set 


for  d  is  {X4j.fY2.Z2}-  When  we  process  d+t,  die  macc-set  for  r  becomes  {X3}.{Y3}. 
Finally,  consider  a+e.  The  derived  macc-set  is  $3}^  3}  which  is  the  current  macc-set 
Thus,  no  change  occurs,  and  we  have  obtained  a  global  assignment  of  mxcc-sets. 


•  «oU 

iV» 


ttoMYol 

C^i}.{Y4} 
^'{Xj. Yjl.Gj) 

I  yixjAw 

rM*3UY3> 


As  another  example,  we  look  at  conditional  assignment  from  section  IX  page  59. 
We  assume  that  the  seed  of  local  conflict  is  node  a. 


^{VjUUjJ 


tf.lVjUU,) 


Hoe  the  only  change  was  to  the  mxcc-set  on  node  c.  For  node  b,  V2  is  a  first  use, 
and  V  |  dka  oat  along  a+b. 


7 5  Splits  and  Twists 

Suppose  we  have  Hm  labeled  with  mace’s.  We  wish  to  modify  this  labeling  in  a 
way  which  reflects  die  insertion  of  move  instructions,  so  that  we  are  finally  able  to 
assign  the  mace's  to  cohabitation  classes  consistent  with  die  generated  code.  We 
ho  pin  by  ftn/4fng  anew*  flfl  the  mXCC^S  which  dlh  trial'll  Thk 

condition  is  most  conveniently  discussed  in  terms  of  the  following  object,  which  we 
do  not  propose  actually  implementing. 

Defhdtiea.  The  mxce-grapk  has  nodes  which  are  mice's,  and  arcs  induced  from  the 
cohabitation  relations  on  die  elements  of  the  mace's. 

□ 

In  terms  of  this  graph,  the  consistency  condition  is  that  there  not  be  an  undirected 
path  in  it  between  distinct  mace's  at  the  same  node  of  H+  We  will  examine 
inconsistencies  by  looking  at  the  image  in  if*  of  the  macc-path  from  die 
inconsistency.  The  inconsistency  condition  is  easier  to  work  with  when  it  is  broken 
down  into  two  conditions.  The  first  is  one  which  is  can  be  seen  along  a  single  arc  of 

a» 

Definition.  Let  *y**2  H+  Suppose  some  msec  of  aj  has  arcs  to  distinct  mxcCs  of 
We  say  that  this  macc  spJfa  along 

a 

An  example  of  splitting  is  the  conditional  alignment  example.  Refer  to  the  previous 
figure.  The  derived  mzee  of  along  b+c  is  {V3,U3},  which  is  not  a  macc  of  node 


e ;  instead,  we  find  {V3},  {U3}. 


It  is  clear  from  the  definition  of  mxcc-graph  that  each  mzcc-arc  "belongs"  to  a 
certain  arc  of  Hm  Note  that  when  there  is  a  split,  the  path  between  distinct  mice's  at 
a  node  belongs  to  a  cycle  of  arcs  in  H .  consisting  of  one  arc  repeated  twice — first 
backward,  then  forward.  We  know  that  splitting  does  not  cover  all  inconsistencies, 
for  if  we  look  at  the  msec  sets  computed  for  the  conditional  exchange  example  (page 
69),  we  see  that  there  is  no  splitting.  Bnt  we  know  that  something  most  be  wrong, 
because  there  is  inconsistency.  The  problem  is  captured  as  follows. 

Definition.  An  undirected  path  in  the  mxcc-graph  between  distinct  mace's  at  the 
same  node  of  if •  is  called  a  twist  if  the  cycle  of  Hm arcs  to  which  it  belongs  is  simple 
(has  no  repeated  arcs). 

□ 

We  draw  the  mxcc-graph  for  the  conditional  exchange  example,  which  motivates  the 
term  twist  (mentally  fill  in  msec  labels  in  the  same  order  as  they  were  listed  in  the 
previous  section). 


This  could  be  drawn  most  symmetrically  on  a  Moebius  band. 

Thooraa  7.3  Suppose  a  mxcc-graph  is  free  of  splits  and  twists.  Then  these  are  no 
inconristeodes  (and  the  components  of  the  mxcc-graph  are  the  desired  cohabitation 
dames). 

Proof.  Suppose  we  have  an  inconsistency,  Le^  an  undirected  path  of  mxcc-arcs 
between  two  mace's  on  the  same  2f«-oode.  We  may  assume  that  this  pads  is  of 
minimal  length.  We  want  to  find  either  a  split  or  a  twist  Look  at  die  cycle  in  if*  to 
which  die  undirected  path  belongs.  If  the  Bicycle  has  no  repeated  arcs,  we  have  a 
twist  and  are  done.  Suppose  some  arc  in  the  Mm cycle  is  repeated,  and  examine  die 


nacMia  belonging  to  it  If  these  arcs  share  a  msec,  they  must  both  leave  that 
mwv,  because  distinct  arcs  can’t  a  msec  after  the  growing  of  local  conflict 
which  was  ******  in  the  .pcerious  section.  Thus,  this  mxcc  is  split  and  we  are  done. 
The  only  other  possibility  is  that  there  are  four  distinct  razcCa  touched  by  these  two 
mace-aics.  Start  at  any  one  of  these  four  mice’s,  and  follow  die  mice  path  until  one 
of  the  four  is  encountered.  This  msec-path  cannot  come  back  to  the  same  mxcc  or 
to  fbe  msec  on  die  other  end  of  the  mxco-axc,  or  we  would  hare  a  msec-cycle, 
nuking  that  the  original  undirected  path  did  not  actually  contain  all  four  msec's. 
There  are  essentially  two  possibilities,  most  easily  described  by  their  pictures  (ellipses 
f**gi**f*  msec’s  at  a  single  node  of  HJ: 


starting  msec 


starting  mxcc 


In  die  first  case,  we  may  extend  dm  msec-path  by  one  more  arc,  and  wind  up  at  a 
distinct  msec  at  the  tune  node  of  la  the  second  case,  we  wind  up  at  such  a 
msec  just  by  the  path.  In  both  cases,  we  hare  contradicted  minimality  of  the  length 
of  the  msec-path. 


The  decomposition  of  die  problem  of  inconsistency  resolution  into  splits  and  twists 
sets  the  stage  for  the  rest  of  this  work.  In  the  nest  chapter,  we  will  consider  die 
problem  of  split  removal.  Given  B,  and  the  information  of  the  accompanying 
msec-graph,  the  techniques  of  that  chapter  win  teU  how  best  to  insert  moves  so  that 
if  If,  is  re-derived,  there  will  be  no  splits.  It  seems  likely  that  split-removal  win 
resolve  most  inconsistencies,  although  we  know  that  it  cannot  resolve  aU  of  diem. 
Chapter  9  discusses  the  problem  of  untwisting.  The  techniques  there  assume  an  B, 
and  a  msec-graph  that  h  free  of  splits,  and  teU  how  best  to  insert  mores  resulting  in 
consistent  cohabitation  and  conflict  relations.  Synergistic  interactions  between 
split-removal  and  untwisting  are  not  considered.  This  question  will  have  to  be 
reopened  it  empirical  evidence  refutes  the  intuition  that  little  would  be  gained  by 
such  techniques. 


8.  Split  Removal 

All  The  Two  Variable  Case 

We  oar  discussion  of  split  removal  with  die  special  case  in  which  U  has  only 
two  variables,  neither  of  which  is  replicated.  In  this  case,  each  node  of  if*  has  at 
most  two  occurrences,  and  if  there  are  two,  there  is  either  one  mice  or  two.  A  split 
always  has  die  following  form: 


•  £VJd) 


(neither  V  nor  U  dies  along  Otis  arc) 


5 IV),  (Hi 


There  may  be  several  such  splits.  Farm  the  modification  subgraph  Mai  H* 

Algorithm  8.1  Construct  a  two  variable  modification  subgraph 
TnrfmU  lit  if  an  arcs  of  H*  along  which  a  split  occurs, 
for  n «-  the  top  node  of  each  split 

Adjoin  to  Jf  all  undirected  paths  starting  at  n  such  that 

(1)  both  variables  are  alive  along  an  arc,  and 

(2)  both  variables  are  in  dm  same  macc  at  a  node. 

This  subgraph  has  the  following  important  property. 

Theorem  8.1  Let  ffj  be  a  node  of  if  at  which  there  is  CHB  triune  arguments  are  V 
and  U.  Let  *2  be  a  node  of  if  at  which  V  and  U  are  in  different  mxcrft,  and  consider 
any  undirected  path  betwen  *j  and  x*  Suppose  that  die  code  is  modified  by  die 
in—r+i/wi  of  moves  (explicit  or  otherwise)  in  such  a  way  that  there  are  no  remaining 
splits  and  no  replications.  Then  there  is  a  move  inserted  in  the  Hrfttc  subgraph 
subtended  by  some  arc  oriented  swap  from  Xj  on  the  path.  In  pictures: 

, —  an  arc  not  oriented  away  from  xj 


v —  an  arc  oriented  away  from  xj 

Proof.  Let  if*1  be  dm  ms-partition  computed  after  the  moves  are  inserted;  grow 
macc-sets  in  Hi.  The  proof  is  based  on  compering  Hi  to  if*  dm  original 
mo-partition.  AH  of  the  nodes  along  our  undirected  path  essentially  lie  in  Hi,  but 
dm  arcs  between  two  nodes  may  be  replaced  by  a  finer  subgraph,  because  of  extra 
occurrences  added  if  a  move  is  inserted  internally  in  Qin^sp,  where  n(  and  Xy  are 
adjacent  on  dm  undirected  path.  Nevertheless,  it  itf+ttj  there  will  be  a  path  in  Hi 


from  *i  to  Hj,  and  dually  if 

In  the  undirected  path,  the  first  arc  must  leave  nlf  not  enter,  because  if  »|  has  a 
CHB,  the  left  operand  is  not  lire  along  any  arc  entering  n^.  Thus,  if  «j  has  been 
modified  (the  CHB  changed  to  a  move),  the  Theorem  holds.  Otherwise,  the  mzcc-set 
of  Mj  as  a  node  of  Hi  will  have  only  one  mice,  because  die  CHB  is  intact,  and 
became  there  are  no  replications.  Let  ji3  be  the  last  node  along  the  undirected  path 
which  has  only  one  msec  in  Hi  (we  may  have  *3=^).  and  let  /i4  be  the  next  node. 
Note  that  «4-»*3  is  impossible,  since  there  would  be  a  directed  path  from  *4  to  *3  in 
Hi.  and  by  the  Growth  Rule,  if  there  are  two  mace’s  at  x4,  there  must  be  two  at  /»3 
— remember,  both  V  and  U  are  live  through  G{*4./!3),  and  insertion  of  moves  does  not 
this.  Thus,  we  must  have  and  the  move  instruction  must  appear  in 

<X*3.*4).  as  desired.  □ 

The  reason  that  we  are  interested  in  this  Theorem  is  that  it  shapes  how  we  look 
for  oode  modifications.  Consider  the  conditional  assignment  example  started  in 
section  12  and  for  which  we  computed  mzcc-sets  in  section  7.4.  The  only  split  is 
along  the  arc  b*e.  If  we  compute  the  modification  subgraph  for  this  case,  we  see 
that  it  only  ^  arc.  Thus,  there  is  essentially  no  in  how  to 

resolve  the  inconsistency:  we  change  the  CHB  to  a  move  instruction. 

A  further  restriction  on  where  the  modifications  occur  is  given  in  the  following 
result 

Lome  t .1  Let  M  be  constructed  as  above.  If  moves  that  are  added  to  the  oode  do 
not  create  replications,  those  moves  do  not  appear  in  Jfrfree  subgraphs  subtended  by 
arcs  in  strongly  connected  components  of  M. 

Preef.  A  node  containing  a  CHB  has  no  incoming  arcs,  and  is  thus  not  in  a  strongly 
connected  component  (see).  Thus,  a  move  in  an  see  will  be  the  form  MOVE  V,  V  or 
HOVE  U.U.  The  picture  is: 


MOVE  V*V2 


4 

Since  V  it  lire  an  the  entry  and  exit  arcs  of  this  see,  »j  will  be  a  V -merge  node  and  «2 
will  be  a  V -split  node.  Let  V  3  be  the  merge  occurrence  of  V  at  N\,  and  V4  be  the 
split  occurrence  at  We  claim  that  Vj  and  V2  are  both  not  last  uses,  Le..  that  the 
move  replicates  V.  If  V2  is  a  last  use,  the  move  instruction  is  superfluous,  and  would 
not  have  resolved  an  inconsistency.  Suppose  V2  it  a  last  use.  Then  V  j  must  cohabit 
with  V4.  since  this  is  the  only  choice.  Since  we  also  have  V3  cohabiting  with  V2  and 
V4  with  V3  (by  we  see  that  V2  cohabits  with  V2.  This  is  also  absurd,  since  it 
too  would  not  resolve  an  inconsistency  (in  fact,  it  formally  causes  one.  since  Vt  and 
V2  are  in  intra-Une  conflict).  Thus,  V2  is  a  last  me,  contradiction.  □ 

It  must  be  noted  that  it  is  occasionally  useful  to  create  replication  in  just  the  above 
way.  However,  it  is  a  second-order  optimization,  and  la  not  treated  here. 

The  previous  two  results  allow  us  to  approximate  an  optimal  solution  to  the  two 
variable  replication-free  split  removal  problem,  by  converting  it  to  an  efficiently 
solvable  graph  problem.  Aamme  that  along  any  arc  of  AT  not  in  an  see,  the  cost  of 
moving  V  is  equal  to  the  cost  of  moving  U.  Then  we  can  call  this  the  common  cost 
of  the  arc  of  2f»  For  arcs  which  are  in  sects,  we  assign  a  cost  of  infinity,  meaning 
that  no  move  is  allowed  on  the  arc.  Call  all  of  the  nodes  of  M  having  cohabitations 
of  V  and  U  toitras,  and  call  nodes  of  M  having  two  macc-seta  sinks.  What  we  are 
interested  in  is: 


Definition.  A  split-removal  modification  {at  srm)  is  a  set  S  of  area  having  the 
property  that  every  undirected  path  from  a  source  to  a  sink  contains  an  element  of  S 
oriented  away  from  the  source. 


We  want  the  srm  of  mfaimsi  coat  This  is  very  dose  to  the  marirnal-flow-minunum 
cot  problem,  the  difficulty  being  that  here  we  are  quantifying  over  undirected  paths 
and  oriented  arcs,  whereas  the  standard  max-flow-min-cut  works  on  directed  paths 
and  oriented  arcs,  or  undirected  paths  and  unoriented  arcs.  We  can  convert  our 
problem  to  the  fully  directed  case  by  a  simple  technical  device. 

Lemma  8.2  Let  M  be  a  graph  whose  arcs  have  costs  and  whose  nodes  are  labelled  as 
sources,  sinks,  or  neither,  where  all  sources  and  sinks  are  not  in  sec’s.  Obtain  Mq 
from  M  in  die  following  way;  for  every  arc  of  M,  adjoin  an  arc  with 
infinite  cost  Then  the  set  of  srm’s  of  M  is  equal  to  the  set  of  finite-cost  cut-sets  (in 
rise  usual  flow-iheoretic  sense)  of  Mq. 

Proof.  There  is  an  obvious  bisection  between  the  set  of  undirected  paths  of  M  and 
directed  paths  of  Mq  so  that  when  we  consider  finite-oost  snn's  and  cut-sets,  the 
remains  true.  The  only  thing  remaining  to  show  is  that  a  finite  coat  cut-set 
maps  to  a  finite  cost  srm.  This  holds  because  all  arcs  in  Mq  and  not  in  M  have 
infinite  cost  Thus  a  finite  coat  cut  set  of  Mq  contains  only  area  of  M;  these 
obviously  constitute  an  srm  of  if,  by  the  Injection  of  undirected  paths  of  M  and 
directed  pates  of  M.  □ 

Now  that  we  know  bow  to  efficiently  compute  an  optimal  srm,  we  show  that  it  does 
in  fact  yield  the  desired  effect  on  the  program.  The  following  is  a  partial  converse 
of  Theorem  8.1. 

Theorem  tJt  Given  an  optimal  srm,  suppose  we  insert  a  HOVE  V  J,V2  at  the  point  in 
the  flowgraph  corresponding  to  each  arc  of  the  srm.  Then  Hi,  constructed  as  in  die 
proof  of  Theorem  8.1,  is  free  of  splits  and  replications. 

Proof.  The  crucial  part  of  this  proof  is  to  show  that  every  directed  path  from  a 
source  to  a  sink  encounters  exactly  one  element  of  the  srm.  Suppose  we  can  show 
this.  Then  the  srm  partitions  M,  and  thus  the  flowgraph,  into  places  where  V  and  U 
cohabit,  Le.,  are  on  a  directed  path  from  a  CHB  without  an  intervening  dement  of 
the  srm,  and  those  places  where  V  and  U  are  in  different  macc-seta,  Le,  where  there  is 
a  directed  path  to  a  rink,  which  is  someplace  in  the  unmodified  flowgraph  were  we 
knew  that  V  and  U  cannot  cohabit  The  boundary  between  these  two  pieces  is 
exactly  where  instructions  of  the  form  HOVE  Vf,V2  are  in  place,  and  so  this  whole 
arrangement  is  what  is  obtained  by  the  growth  rule. 


To  prove  the  result,  we  obtain  a  contradiction  from  assuming  that  some  forward 
path  in  M  from  a  source  to  a  sink  encounters  two  arcs  of  the  srm.  We  make  heavy 
use  of  die  optimality  assumption.  First,  we  introduce  the  concept:  a  node  n  is 
source-separated  if  any  undirected  path  between  a  source  and  n  has  some  arc  of  the 
srm,  oriented  away  from  die  source.  Call  the  dual  concept  sink-separated.  Let  be 
pointed  to  by  an  arc  A\  in  the  srm  on  a  directed  path  from  a  source  to  a  sink  having 
two  srm  arcs,  and  let  Ai  not  be  the  last  such  arc.  We  claim  that  Uj  is  not 
source-separated.  If  so  we  claim  that  Ay  can  be  dropped  from  the  srm,  and  the 
remainder  will  still  be  an  srm.  The  only  undirected  paths  that  this  would  affect  are 
those  containing  At  oriented  away  from  the  source.  But  in  such  paths,  since  nl  is 
source-separated,  we  know  that  there  is  another  arc  of  die  srm,  properly  oriented. 
Thus.  A\  is  unnecessary  in  the  srm,  contradicting  its  optimality.  We  conclude  that  rt^ 
is  not  sink-separated. 

Let  A2  be  the  next  arc  in  the  srm  on  the  forward  path  after  A^%  and  let  *2  be  the 
arc  which  A2  leaves.  We  may  dually  conclude  that  is  not  sink-separated.  But  then 
we  have  an  snn-fiee  undirected  path  from  a  sink  to  *2  not  sink-separated),  and 
from  *2  to  ftj  (because  A2isibe  next  arc  in  die  srm  after  Av  and  from  ny  to  a  source 
(*l  is  not  source  separated).  But  this  contradicts  the  assumption  that  we  were  given  a 
srm.  Thus,  we  can  conclude  that  on  any  directed  path  from  a  source  to  a  sink,  there 
is  only  one  arc  of  an  optimal  srm.  □ 


The  reader  should  be  aware  of  the  fact  that  an  optimal  srm  may  have  several  arcs  an 
an  undirected  path  from  a  source  to  a  sink. 


•  }  sources 


circled  arcs  constitute  the  srm 


numbers  are  costs 


•  }  sinks 


78 


&2  The  Difficulty  of  Split  Removal 

In  this  section  we  formulate  the  general  split  removal  problem,  and  show  that 
is  is  NP-hard.  The  result  is  of  course  not  tremendously  useful  in  designing  the  code 
generator,  other  than  halting  the  search  for  an  efficient  algorithm.  However,  the 
proof  is  instructive  both  in  showing  the  source  of  the  combinatorial  difficulty,  and  in 
suggesting  approximation  heuristics. 

The  split  removal  problem  involves  both  the  ms-partition  H»  and  the  associated 
msec-graph.  Two  nodes  and  a  connecting  arc  of  H*  have  associated  mace’s  and 
msec-arcs,  depicted  thus: 


•  (V.U'X).  (Y,...), - 

J\\  \  \ 

•  tv>.  IU.X),  tY. . - . ) . CZ,...) 


(Occurrence  subscripts  have  been  omitted.)  When  we  are  worrying  about  split 
removal,  the  algorithm  for  growing  mxcc-sets  has  already  been  applied.  Thus  we  may 
see  splits,  as  in  the  msec  {V.U.X}  above,  but  we  will  never  see  arcs  from  different 
wwrti  going  into  the  » »*» 

We  dunk  of  split  removal  as  effected  by  the  insertion  of  a  set  of  moves  having  the 
property  that  after  they  are  inserted  and  mxccc-sets  regrown,  there  are  no  splits.  The 
moves  have  the  effect  of  breaking  the  cohabitation  arc  into  the  destination 
msec-das*.  In  die  above  case,  the  destination  of  the  move  might  be  the  msec-set 
{U,  X}  on  the  lower  node.  This  removes  the  split,  as  we  can  see  locally.  It  is  also 
possible  to  break  up  a  msec  well  in  advance  of  a  split,  in  which  case,  it  can  be  seen 
to  remove  the  split  only  by  growing  msec-sets. 

In  purely  graph-theoretic  terms,  a  split  removal  can  be  thought  of  as  a  set  of  sets 
of  cohabitation  arcs,  each  individual  set  of  cohabitation  arcs  belonging  to  a  common 
Bt-nc  and  originating  in  a  common  msec.  In  the  above  example,  a  set  of 
cohabitation  arcs  would  be  the  singleton  V-»V  arc,  or  die  set  consisting  of  the  U-4J 
and  X-»X  arcs.  Each  set  of  cohabitation  arcs  corresponds  to  a  single  move  instruction, 
so  that  a  split  removal  set  is  defined  to  have  the  property  that  removing  die  arcs, 
partitioning  the  msec-sets  at  the  destinations,  and  growing,  leads  to  a  split-free 
msec-graph.  The  cost  of  a  split  removal  might  depend  in  a  complicated  way  on  the 


costs  of  the  cohabitation  arcs  involved,  but  we  will  prove  NP-hardness  in  the 
restricted  case  where  the  cost  depends  only  upon  the  Hr  axe. 

Definition  To  color  a  graph  is  to  assign  integers  (colors)  1.  ....  x  to  its  nodes  such 
that  adjacent  node*  do  not  receive  the  same  colon.  A  minimal  coloring  is  one  which 
ohs  i  minimum  value  of  x*  We  say  that  x  is  the  chromatic  number  of  the  graph. 

□ 

The  problem  of  minimally  coloring  a  graph  is  known  to  be  NP-compiete  [3J.  We 
«Hail  show  how  to  transform  a  graph  to  be  colored  into  a  split  removal  problem,  such 
that  the  solution  of  the  split  removal  problem  will  give  a  minimal  coloring  of  the 
graph,  thereby  proving  that  the  split  removal  problem  is  NP-hard.  Given  a  graph,  we 

view  each  node  /  as  corresponding  to  variable  VW,  i  =  1 . n,  where  n  is  the  number 

of  nodes.  We  construct  H»  thus: 


fy(I)  yWj 


{y(0j  f  {y( l>] 

We  have  depicted  the  mxcc-aet  at  the  top  node;  this  same  msec-set  is  also  the 
mzoc-set  at  all  the  nodes  in  the  loop.  We  have  also  depicted  the  mxcc-set  on  the 
lower  right  node.  This  corresponds  to  an  arc  between  nodes  i  and  j  of  the  graph  to 
be  colored.  Li  fact,  for  every  arc  in  die  graph  to  be  colored,  we  have  two  exit  arcs 
from  the  loop,  each  ending  in  an  27«-node  with  mxcc-set  like  the  above,  and  each 
node  of  the  pair  connected  to  a  common  node.  Thus,  if  the  graph  to  be  colored  has 
a  arcs,  the  H,  constructed  above  has  2  a  exit  arcs  from  the  loop  2- a  more  arcs  after 
these,  2  a  +  1  arcs  in  the  loop,  and  one  more  arc  A,  on  top.  The  cost  on  the  arcs  in 
the  loop  we  pvt  at  infinity;  ail  other  arcs  have  s  cost  of  1— this  simply  means  that 
dm  frequency  is  so  low  that  the  cost  of  inserting  a  move  is  simply  the  space  for  it  It 
is  dear  that  the  desired  H»  can  be  constructed  in  polynomial  time.  Note  that 


growing  mxcc-sets  would  change  nothing. 


Now.  suppose  we  are  given  an  (optimal)  solution  to  the  split  removal  problem,  and 
that  the  mzcc-sets  have  been  reinitialized  in  accordance  with  the  inserted  move,  and 
regrown.  Let  us  consider  variables  V(0,y(/)  where  (ij)  is  an  arc  in  the  graph  to  be 
colored.  Suppose  that  VW  and  are  still  in  the  same  mice  at  the  entry  node  of 
the  loop.  no  moves  have  been  inserted  in  the  loop,  there  must  be  moves  on 

of  tiie  two  exit  arcs  for  (fj).  But  is  this  is  the  case,  we  can  improve  the  solution 
by  removing  the  two  moves  and  placing  a  single  move  on  arc  if.  contradicting  the 
optimality  of  the  solution.  We  conclude  that  if  (/J)  is  an  arc  of  the  graph  to  be 
colored,  VW  and  are  in  different  msec's  at  the  entry  node  to  the  loop. 

The  correspondence  with  the  ctdoring  problem  now  follows.  A  solution  of  the 
split  removal  problem  minimi*^  the  number  of  mire's  at  the  mice  set  at  entry  to  the 
loop.  We  color  each  node  of  the  graph  to  be  colored  by  its  msec;  by  what  we  have 
said,  this  is  a  coloring  of  the  graph.  Convenely.  any  adoring  of  the  graph  in  x 
colon  can  be  turned  into  a  split  removal  with  cost  x-1.  This  proves 

Theorem  S.3  The  split-removal  problem  is  NP-hard. 

&3  The  Eueatually-Seperate  Relation 

We  have  seen  that  the  two-variable  case  of  split-removal  is  easy,  and  tint  the 
general  case  Is  hard.  In  later  sections  of  this  chapter  we  shall  outline  an  approximate 
solution  to  the  difficult  case.  It  will  reduce  split  removal  to  several  max-flow-min-cut 
problems  and  several  graph  coloring  problems.  Known  heuristics  may  be  applied  to 
the  graph-adoring  problems  (see  [4]).  These  represent  some  of  the  intrinsic  difficulty 
of  the  split  removal  problem.  When  there  are  only  two  variables,  the  graph  coloring 
problems  are  easy,  and  the  approximation  algorithm  reduces  to  the  algorithm 
proposed  earlier,  so  it  is  exact 

In  this  section,  we  will  introduce  a  relation  that  is  important  in  constructing  both 
the  max-flow-min-cut  and  the  coloring  problems.  We  begin  by  looking  at  a  split  As 
our  canonical  example,  we  will  use 


JY.{V,U,X,Y.Z},... 

li\\\ 

.{VI,  {U,X>,  (Y) , . . . 

The  split  induces  a  relation  on  the  occurrences  of  a  mxcc  at  N,  which  we  may  draw 
at  a  graph: 


X 


We  call  this  relation  etentually-separaie  because  there  is  a  forward  path  from  N  that 
eventually  leads  to  a  node  where  occurrences  so  related  are  in  separate  mace’s. 

Suppose  that  there  is  an  assignment  Y«-Z  at  N,  which  we  see  as  a  CHB  Y,  Z  and  an 
intra-line  cohabitation  from  Z  to  Y.  It  is  dear  that  Z  is  eventually-eeparate  from 
anything  from  which  Y  is  eventually-separate,  and  we  explicitly  include  the  pairs 
(V,Z).  (U.Z)  and  (X.Z)  in  die  eventually-separate  relation  at  N.  This  is  called 
completing  the  relation.  To  «■«»»>««»  what  we  have  said  so  fan 
Algorithm  8.2  Initialize  eventually-separate  relation. 

for  A  *-  each  arc  is  Hm 

lor  m  *■  each  mxcc  at  the  beginning  of  A 
tor  0j,O2  *-  each  pair  of  occurrences  in  m 
If  Oj  and  Oj  nng>  into  different  nsec’s  along  A 
make  eventually-separate 
Complete  die  eventually-separate  relation  of  m 

Observe  that  non-splits  result  in  null  relations.  A  two-way  split  produces  bipartite 
complete  graphs  within  a  mxcc;  it  is  one  of  the  places  in  which  die  assumption  of 
small  mice’s  plays  a  role  in  practicality.  Even  with  four  occurrences  in  a  mxcc,  the 
largest  number  of  eventually-separate  pairs  is  six. 

Continuing  our  above  example,  let  us  look  at  what  might  occur  at  a  node 
preceding  N,  and  its  corresponding  eventually-separate  relation: 


(Evidently  X  and  Y  are  dead  along  this  arc.)  We  will  modify  the  eventually-separate 


relation  at  JVj  by  polling  back  the  relation  at  y  and  completing  it 

Before  giving  the  algorithm  for  propagating  the  eventoally-separate  relation,  we 
discus  an  interesting  consequence  of  completing  it  it  is  posable  for  an  occurrence 
to  be  eventoally-separate  from  itself.  This  happens  whenever  there  is  an  intra-line 
cohabitation  between  two  occurrences  that  are  eventoally-separate.  As  an  example  of 
this,  look  at  die  cohabitation  graph  of  the  first  figure  of  section  IX  and  its 
corresponding  2f«  in  the  last  figure  in  section  7.4.  Since  there  is  an  intra-line 
cohabitation  arc  from  U2  to  V2,  and  since  V2  and  U2  are  eventoally-separate  (by 
initialization),  U2  will  be  eventoally-separate  from  itself.  Thinking  of  how  this  would 
appear  in  die  graph  of  the  relation,  we  say  that  U2  has  self-loop  on  it.  Daring  split 
removal,  any  self-loop  will  at  some  point  be  in  a  source  node  of  a  modification 
subgraph.  We  saw  this  in  the  conditional  assignment  example  of  section  8.1,  page  74. 
The  algorithm  far  propagating  the  eventoally-separate  relation  is  phrased  so  that 
self-loops  are  not  propagated. 

Algorithm  tJ  Propagate  eventoally-separate  relation  along  A 

far  m «-  each  mice  at  the  beginning  at  A 

far  ol,02  «-  each  pair  of  distinct  occurrences  in  m 
If  and  Oj  map  to  eventoally-separate  occurrences  along  A 
make  eventually  separate. 

Continuing  oar  above  example,  we  would  get  a  new  eventoally-separate  relation  at 
Nv: 


Just  as  we  propagated  the  growth  of  mxcc-sets  forward,  we  propagate  growth  of  the 
eventoally-separate  relation  backward,  until  the  relation  stabilizes.  Along  an  arc  A  of 
Mm  from  N  to  S',  die  eventoally-separate  relations  of  several  mice’s  of  S'  may 
contribute  to  the  same  msec  at  N.  However,  since  the  mxoc-sets  have  been  "grown" 
(Algorithm  7.4),  a  given  mzee  of  N'  can  affect  at  most  one  mxcc  at  jV.  After 
growing  the  eventoally-separate  relation,  we  will  have  an  eventoally-separate  relation 
an  each  mxcc  of  each  node  of  Hm  with  the  following  property;  if  all  first  uses 
(corresponding  to  dead  variables  on  the  incoming  arc)  are  removed,  the  relation  maps 
backward  along  incoming  arcs  to  a  sob-relation  (think,  sub-graph)  on  a  mxcc  on  a 
previous  node  of  Hm 


We  said  in  the  introduction  to  this  section  that  graph  colorings  would  enter  into 
the  split  removal  process.  The  graphs  that  are  colored  are — almost— the  eventually* 
separate  relations  on  msec's.  The  colors  correspond  to  the  cohabitation  classes  that 
will  exist  mice  splits  are  removed.  In  some  situations,  colors  on  occurrences  at  a  node 
of  U,  merely  signify  the  cohabitation  classes  into  which  the  cohabitation  class  of  an 
occurrence  is  eventually  copied;  in  others,  differently  colored  occurrences  at  a  node 
of  H*  will  be  in  different  cohabitation  classes. 

Technically  speaking,  it  is  impossible  to  color  a  graph  with  self-loops,  so  it  is  not 
always  possible  to  exactly  color  the  eventuaUy-separate  relation.  Further,  the 
"coloring”  that  we  can't  quite  do  must  be  propagated  from  node  to  node  of  Hm 
reflecting  the  cohabitation  classes  that  we  are  trying  to  form.  This  propagation  runs 
into  other  difficulties.  Both  sets  of  difficulties  are  taken  care  of  as  we  construct  a 
modification  subgraph,  (the  subject  of  the  next  section)  analogous  to  the  one  we  used 
in  the  two  variable  case.  In  the  general  case,  we  will  make  several  such  constructions. 

&4  CoogtroctloB  of  a  Modification  Subgraph 

The  construction  of  a  modification  subgraph  is  an  attempt  to  get  a  good 
approximation  to  a  problem  that  is  known  to  be  difficult  The  philosophy  of  the 
construction  is  to  derive  as  much  power  as  we  can  from  the  network-flow  technique, 
which  we  know  provides  an  optimal  solution  in  the  two  variable  case.  The  strategy  is 
to  group  together  occurrences  in  each  mxcc  into  two  subsets — black  and  white.  The 
^il^i^ilt  occurrences  aa  a  whole  act  as  a  variable,  and  the  white  occurrences  as  a  whole 
act  aa  another  variable.  There  are  cohabitation  arcs  between  black  and  white 
occurrences  only  at  source  nodes  in  the  derived  network-flow  problem,  just  as  in  the 
two  variable  case,  there  are  cohabitation  arcs  between  die  two  variables  only  at  a 
source  node.  Corresponding  to  the  situation  that  two  variables  are  in  different  mice’s 
at  a  rink,  we  will  construct  the  modification  subgraph  so  that  at  a  sink,  a  black  and  a 
white  occurrence  are  never  in  the  same  mxcc. 

In  the  absence  of  an  implementation,  it  is  possible  to  make  only  plausabOity 
arguments  for  this  approach.  The  main  argument  we  make  is  that  there  are  not  too 
many  occurrences  at  a  node  of  if*  and  the  modification  subgraph  never  departs  too 
far  from  the  two  variable  case.  Realize  that  if  there  are  several  occurrences  at  a  node 


of  Hm  they  will  have  different  variables,  and  there  most  be  some  assignment  that 
caused  the  cohabitation.  It  is  hard  to  imagine  a  real  program  where  more  than  three 
of  four  variables  have  all  been  assigned  together.  Even  with  assignments  generated 
by  the  compiler,  say  to  model  parameter  passing,  a  half  dozen  occurrences  at  a  single 
node  of  Hm  seems  an  extreme  number.  There  are  several  places  in  this  and  the  next 
section  where  we  invoke  the  smallness  of  mxcc-seta,  usually  to  justify  not  worrying 
too  hard  about  choices  to  be  made.  As  we  know  from  the  two  variable  case  and  the 
proof  at  NP-hardneas,  as  the  number  of  occurrences  at  a  node  of  H»  grows,  so  does 
the  unlikelihood  of  finding  a  reasonable  approximation. 

We  begin  the  construction  of  a  modification  subgraph  AT  at  a  split.  As  in  the  two 
variable  case,  a  split  corresponds  to  an  arc  of  M  entering  a  sink.  Here,  there  are 
arbitrarily  many  variables,  and  complications  arise  that  were  not  seen  previously. 
The  first  of  these  is  that  because  a  node  of  H,  may  have  several  mace’s  that  might  be 
broken  up  by  the  insertion  of  move  instructions,  we  must  think  of  M  as  a  subgraph  of 
the  race-graph*  not  of  if*  (In  two-variable  split  removal,  the  only  part  of  the 
mace-graph  where  it  made  sense  to  put  moves  was  where  there  were  two  active 
variables,  and  thus  only  one  mice.  This  made  the  mxcc-graph  correspond  exactly  to 
if*)  Strictly  speaking,  if  is  not  really  a  subgraph  of  the  mxcc-graph,  because  there  is 
only  one  arc  of  if  entering  a  sink,  where  the  mxcc-graph  has  a  fan-out: 


mxcc-graph 

m 

Jf 

A 

/  t  \ 

'•in' 

m\  **2  **3 

With  this  abuse  of  terminology  understood,  we  continue  to  call  M  a  modification 
subgraph,  hut  of  the  mxcc-graph,  not  if* 

A  split  is  always  associated  with  an  arc  A  of  Hm  and  a  race  m  belonging  to  the 
node  at  the  beginning  of  A.  The  size  of  a  split  associated  with  Ajn  is  one  less  than 
the  number  of  mxcc-arcs  leaving  m  and  belonging  to  A  (so  the  size  is  zero  if  there  is 
no  split).  We  describe  a  construction  for  M  that  reduces  the  size  of  the  starting  split, 
perhaps  to  zero.  This  same  M  may  also,  serendipitously,  reduce  the  size  of  other 
splits.  Inserting  the  move  instructions  corresponding  to  a  cut  of  M  may  divide  the 


mzcc-graph  into  two  connected  components.  Whether  or  not  this  happens,  the  total 
size  of  splits  in  the  new  mxcc-graph(s)  will  be  less  than  the  original.  After  several 
iterations  of  constructing  a  subgraph  and  inserting  the  move  instructions 

corresponding  to  it,  the  total  size  of  splits  will  be  reduced  to  zero. 

In  die  general  case  a  split  may  take  a  single  mzcc  m  at  a  node  IV  of  if*  to  several 
mace’s  m{  at  a  subsequent  node  N.  We  dull  choose  at  least  one  of  die  mt  to  be 
Made,  and  one  to  be  white.  By  the  smallness  of  mxcc-sets,  there  are  probably  only 
two  macCS.  In  the  unlikely  event  that  there  are  more  dun  two.  black  and  white  may 
be  aasigned  to  the  others  arbitrarily.  The  occurrences  in  each  of  the  receive  die 
color  of  the  and  the  occurences  in  m  are  colored  according  to  die  occurrence 
they  correspond  to  in  one  of  the  Initializing  M  thus  produces  a  situation  like 


nil  mi 


The  M  that  we  construct  will  correspond  to  the  freedom  we  have  in  inserting  move 
instructions  that  separate  black  from  white  along  this  split 

There  may  be  occurences  in  m  that  are  last  uses,  and  so  will  not  be  given  colors 
by  the  above  role.  We  will  eventually  assign  these  occurences  either  black  or  white, 
but  on  die  buss  of  what  can  be  seen  between  N  and  N',  there  is  no  reason  to  choose 
either  one.  It  is  convenient  to  assgn  last  ues  arbitrary  distinct  colors.  As  die 
construction  of  M  proceeds,  these  colors  may  be  merged  together,  or  may  be  merged 
with  black  or  white.  But  at  we  shall  see,  we  never  merge  black  with  white. 

Since  colors  correspond  to  cohabitation  classes,  the  next  step  of  the  algorithm  is  to 
merge  colors  of  occurrences  that  are  connected  by  an  intra-line  cohabitation  arc  (for 
simplicity,  assume  that  there  is  at  most  one  CHB  per  node  of  ff«).  This  may  result  in 
some  of  the  last  uses  becoming  the  same  color,  or  Mack,  or  white.  If  this  rule  says  to 
merge  Mack  with  white,  we  don’t  Rather,  by  analogy  with  the  two  variable  case,  m 
is  labeled  a  source.  In  this  case,  M  consists  only  of  m,  mt,  and  an  arc  connecting 
diem— there  is  no  choice  about  where  to  put  the  move  to  reduce  or  remove  this  split 


Suppose  that  black  and  white  do  not  intra-line  cohabit  at  N.  Then  there  remains 
the  possibility  of  inserting  a  move  instruction  to  separate  black  and  white  on  a  path 
lifting  to  Jf.  If  this  is  done,  then  die  black  and  white  occurrences  at  N  will  end  up 
in  separate  mice’s  after  growing  mxcc-seta,  and  of  coarse  this  growth  will  propagate 
forward  on  all  paths  leaving  N.  Thus,  before  including  in  M  anything  before  N,  we 
investigate  what  would  happen  during  the  growth  of  mace-sets  from  N,  if  black  and 
white  occurrences  were  in  separate  mice’s.  This  investigation  is  accomplished  by  a 
forward  propagation  of  the  colors  of  occurrences  at  N.  (Colors  other  than  black  and 
white  are  not  propagated,  because  occurrences  with  these  colors  are  last  uses.)  As 
colors  are  propagated  forward  along  a  mxcc-arc,  all  of  the  non-first  occurrences  in 
tile  destination  msec  receive  colors.  The  assignment  of  colon  can  he  extended  to  all 
the  occurrences  in  the  msec  by  propagation  along  intra-line  cohabitation  arcs. 

We  first  consider  the  case  in  which  the  occurrences  of  the  mice  are  all  black  or  all 
white.  The  entering  mxcc-arc  is  not  a  place  where  inserting  a  move  instruction  will 
separate  black  from  whit',  so  this  mxcc-arc  is  not  included  in  M,  and  the  scan  does 
not  continue  from  this  point  However,  for  reasons  that  become  dear  later,  the 
colors  are  left  on  the  occurrences. 

Let  if  be  an  if.  arc  from  N  to  JVj.  Suppose  that  every  mxcc-arc  leaving  m  and 
belonging  to  A  ends  in  an  all  black  or  all  white  node.  This  is  called  a  complete  split. 


The  forward  step  to  taken  here  causes  a  situation  that  looks  exactly  like  the 
original  split  Naturally,  we  create  a  new  rink  mj  for  if,  and  connect  m  to  it 

We  next  conrider  the  case  where  the  destination  mxcc  of  the  mxcc-arc  receives 
both  a  Mack  and  a  white  occurrence. 


We  ask  whether  some  pair  of  black  and  the  white  occurrences  of  /rtj  are  eventually- 
separate.  If  not,  we  will  argue  that  as  an  approximation,  m  should  not  be  included 
in  M.  The  part  of  the  program  reachable  from  JVj  does  not  reach  a  split  for  the 
mice  iftj  (else  the  occurrences  would  be  eventually-separate),  meaning  that  black  and 
white  can  cohabit  as  far  as  is  concerned.  It  seems  unlikely  that  an  optimal  split 
removal  would  separate  occurrences  that  can  cohabit  Thus,  we  terminate  the 
forward  scan,  and  do  not  Include  *tj  in  M.  Further,  we  argue  that  black  should  not 
be  separated  from  white  before  N,  for  the  reason  that  they  would  also  be  separated 
on  paths  starting  from  N,  specifically,  those  leading  to  JVj  and  beyond.  Thus,  we 
label  m  a  source  and  terminate  die  construction  of  M.  In  this  situation,  as  in  the  case 
where  black  and  white  intra-line  cohabit  at  N,  there  is  no  choice  about  where  to 
reduce  the  split 

The  remaining  case  is  that  each  Uack-white  pair  of  occurrences  at  ^  are  in 
different  mxcCS  or  are  eventually-separate.  In  this  case,  we  continue  the  forward 
scan  from  *>j.  If  a  max-arc  leaving  ml  arrives  at  a  mxcc  that  is  not  already  colored, 
then  we  have  the  same  cases  that  we  had  as  we  left  vs,  and  this  is  true  in  general  as 
we  scan  forward.  The  new  case  is  that  we  may  encounter  a  mxcc  that  has  already 
been  colored. 

The  simplest  and  most  pleasant  case  that  arises  in  an  already  colored  mxcc  is  that 
the  cohabitation  arcs  along  the  mzcc-arc  being  scanned  connect  Mack  with  black,  and 
white  with  white.  In  this  case,  the  arc  is  included  in  M,  and  the  forward  scan 
continues  along  other  paths.  A  related  possibility  is  that  the  scan  arrives  back  at  m, 
and  some  of  the  black  or  white  nodes  propagate  to  last  uses  there.  The  colors  of 
these  last  uses  are  merged  with  black  or  white,  as  required  by  the  cohabitation  arcs. 
As  long  as  this  can  happen  without  an  attempt  to  merge  black  and  white,  we  have 
fii»  gimpJc  pleasant  case. 

Suppose  though,  that  a  Mack  occurrence  is  carried  by  a  mxcc-puth  to  a  white 
occurrence  at  the  same  node.  If  this  occurs,  we  have  met  a  problem  alluded  to 
earlier— the  propagation  of  colon  (specifically.  Mack  and  white)  cannot  he  done 
consistently.  This  is  called  a  latent  twist,  toe  a  reason  we  now  explain,  Suppose  we 
separate  Mack  from  white  (with  a  move  instruction)  on  some  path  leading  to  the  arc 
which  caused  the  Mack-white  merge.  Then  after  split  removal  and  growth  of  mxcCs, 


we  would  have  a  twist,  and  there  would  have  to  be  extra  moves  (exchanges)  to  resolve 
the  inconsistency  Now,  it  is  conceivable  that  all  this  might  be  part  of  an  optimal 
inconsistency  resolultion,  but  so  unlikely  that  die  construction  of  M  excludes  die 
possibility.  The  point  is  that  we  can  dictate  that  the  occurrences  involved  in  a  latent 
twist  always  cohabit,  and  still  resolve  the  inconsistency— thus  the  term  "latent**. 
Following  previous  reasoning  to  ensure  that  occurrences  in  a  latent  twist  cohabit,  we 
label  m  a  source,  and  delete  from  if  all  the  structure  that  was  added  on  the  scan 
forward  from  m. 

To  summarize  the  forward  scan  from  m,  the  effect  is  either  to  consistendy  assign 
colon  to  occurrences  in  every  mice  reachable  on  a  forward  path  from  m,  stopping  at 
all  Mwfr  and  all  white  mace's,  and  to  include  all  of  this  part  of  the  mace-graph  as 
part  of  M,  or  to  label  m  a  source,  include  none  of  this  part  of  the  mxcc-graph  in  M, 
and  to  terminate  the  construction  of  M  (leaving  it  with  only  m,  m',  and  the 
connecting  arc). 

If  the  construction  of  if  is  not  complete,  die  next  m  to  scan  backward  from  any 
mice  m  that  has  already  been  included  in  M.  We  take  a  backward  step  from  a  msec 
m.  ix.,  consider  a  msec-arc  entering  m,  only  when  the  coloring  at  m  can  be 
consistendy  propagated  along  all  forward  paths.  After  the  forward  scan  from  the  top 
of  the  split  (also  called  m),  all  of  the  nodes  included  in  M  during  the  scan,  as  well  as 
the  top  of  the  split,  enjoy  this  property. 

Let  wi  be  a  msec  having  an  exiting  arc  that  enters  m.  U  wt  has  been  colored,  die 
only  possibility  is  that  it  is  a  mxcc  that  is  part  of  a  sink  or  is  an  all  white  or  all  black 
node  at  which  the  forward  scan  stopped.  This  bizarre  case  looks  like  the  following 
(cohabitation  graph  on  the  right,  corresponding  mxcc-graph  on  the  left): 


Because  mxcc-sets  have  been  grown,  back  propagated  occurrences  arrive  at  a  single 


nucc,  so  that  if  back  propagation  to  a  sink  occurs,  it  will  connect  black  and  white. 
This  is  «nr>th»f  version  of  a  latent  twist;  we  cannot  propagate  colors  consistently.  As 
we  did  before,  we  stop  the  propagation  by  labeling  some  node  a  source,  in  this  case 
nt.  This  seems  peculiar,  because  nt  is  also  part  of  a  sink.  However,  it  must  be 
correct,  because  it  allows  us  die  flexibility  of  inserting  a  move  instruction  between  nt 
and  m,  m  and  aip  and  my  and  the  sink,  each  of  which  is  a  reasonable  place  to 
remove  the  split  The  modification  graph  looks  like  this: 


source' 


►•sink 


Like  other  latent  twists,  this  probably  is  unlikely  to  happen  in  real  programs. 

The  ordinary  case  is  that  the  occurrences  of  nt  have  not  already  been  colored.  In 
this  case,  we  include  in  M  the  msec  nt  tad  the  mxcc-arc  between  it  and  m.  The 
colors  at  m  propagate  backward,  so  we  get  colors  at  all  occurrences  of  nt  except  the 
last  uses  (as  at  the  top  of  a  split)  and  occurrences  that  split  off  to  some  mxcc  other 
than  m  (a  new  phenomenon  in  the  general  case).  These  occurrences  are  given  new 
distinct  colon.  As  before,  we  merge  colon  according  to  intra-line  cohabitations  at 
nt.  unless  this  would  merge  black  and  white,  in  which  case  we  make  nt  a  source,  and 
do  not  continue  a  backward  scan  from  it  Again  as  before  we  do  not  scan  backward 
from  nt  until  we  scan  forward.  This  forward  scan  is  much  the  same  as  from  the  top 
of  a  split  If  the  scan  encoonten  a  latent  twist  or  black  and  white  occurrences  that 
are  not  eventually-separate,  then  everything  adjoined  to  M  since  the  start  of  the  scan 
from  nt  is  excluded  from  if,  and  nt  is  labeled  a  source.  A  forward  scan  In  the 
general  case  can  encounter  a  source  node  (on  tire  first  forward  scan,  there  were  no 
sources).  The  reason  that  a  node  is  labeled  a  source  is  that  the  split  should  be 
removed  after  that  point,  because  otherwise  a  merge  of  black  and  white  is  implied. 
Thus,  if  a  forward  scan  from  nt  encounters  a  source,  we  also  conclude  that  nt  should 
be  a  source,  and  as  usual,  we  exdude  from  M  everything  that  was  adjoined  since  the 
start  of  the  forward  scan. 

In  addition  to  the  possibility  of  encountering  sources,  there  is  another 
complication  that  we  previously  did  not  have  to  consider  some  of  the  colon  may  be 
ndtfarr  black  nor  white.  We  now  review  forward  propagation,  incorporating  this 
extra  generality.  Suppose  a  mxcc  does  not  receive  a  black-white  pair  of  eventually- 
separate  occurrences.  If  there  are  only  Uadi  or  only  white  occurrences,  we  have  the 


situation  as  before,  and  take  the  same  action.  Otherwise,  there  is  a  dilemma. 
On  the  hand,  we  cannot  the  mice  in  M  and  continue  the  forward  scan, 

because  this  part  of  the  mxcc-graph  would  no  longer  represent  places  where  an 
inserted  more  instruction  would  separate  black  from  white.  On  the  other  hand  if  we 
merely  quit  owning,  there  is  the  possibility  that  a  merge  of  colors,  made  because  of 
a  construction  elsewhere  in  M,  might  cause  an  eventuaUy-sepante  black-white  pair  to 
appear  in  the  mxcc,  in  which  case  we  should  hare  propagated  forward. 

The  solution  is  to  continue  the  forward  scan  merging  colors  according  to 
cohabitation  arcs,  but  not  any  of  the  structure  in  M.  We  call  this  a  tentative 

forward  scan,  and  say  that  we  tentatively  include  part  of  the  mxcc-graph  in  M.  Like 
the  entire  forward  scan,  it  may  be  necessary  to  abort  a  tentative  forward  scan,  for 
mmpl*,  if  a  latent  twist  is  discovered.  If  the  beginning  mxcc  of  a  tentative  scan  has 
both  a  black  and  a  white  occurrence,  then  aborting  the  tentative  scan  causes  an  abort 
of  the  entire  forward  scan.  Otherwise,  a  more  benign  approach  to  the  abort  may  be 
taken.  When  we  were  considering  the  case  with  only  black  and  white,  the  scan 
stepped  when  a  mxcc  was  all  black  or  all  white.  Thus,  when  aborting  a  tentative 
scan,  we  merge  together  all  the  colors  at  the  beginning  mxcc  and  eliminate  the 
tentative  part  of  M  included  in  the  scan.  This  may  cause  the  mxcc  to  become  all 
white  or  all  black,  or  all  some  other  color.  The  mxcc  will  still  become  a  source  if  it 
is  reached  on  a  backward  step,  since  its  mooochromidty  would  cause  a  merge  of 
black  and  white. 

We  briefly  review  what  can  happen  during  a  tentative  forward  scan.  If  a  source  or 
a  latent  twist  is  encountered,  the  tentative  scan  is  aborted.  If  a  step  is  taken  to  a 
mxcc  that  receives  only  one  color,  then  even  the  tentative  scan  stops;  the  mxcc  and 
the  arc  to  it  are  not  tentatively  in  M — no  matter  what  merges  of  colon  occur  in  the 
construction  of  if,  a  forward  scan  would  not  indude  these  in  M,  and  would  not 
continue  from  here.  Thus,  we  will  maintain  the  rule  that  a  mxcc  tentatively  in  M 
always  has  distinct  colors,  just  as  msec's  in  M  always  have  Mack  and  white.  However, 
merges  of  colon  later  in  a  tentative  scan,  or  even  later  in  the  construction  of  M,  can 
cause  this  to  be  violated.  Merges  of  colon  must  therefore  be  accompanied  by  a 
check  on  whether  they  violate  this  rule.  If  so,  the  mice's  tentatively  In  M  and  arcs  to 
them  are  no  longer  tentatively  in  M.  It  is  as  if  the  tentative  scan  had  never  gone 
beyond  this  point  An  efficient  implementation  of  the  check  on  merges  and  possible 


91 


qnrioing  of  the  tentative  scan  is  a  programming  problem  not  considered  here.  If  a 
mxcc  becomes  all  black  or  all  white,  it  is  possible  that  a  mice  leading  to  it  now  has  a 
complete  split  Tins  possibility  most  be  checked,  and  a  sink  added  to  M  if  it  occurs. 

It  is  possible  for  a  forward  step  in  a  tentative  scan  to  arrive  at  a  mxcc  that  is 
already  in  M.  We  merge  colors  along  the  mxcc-arc  as  usual;  this  may  detect  a  latent 
twist  aborting  the  tentative  scan.  If  not  the  merges  guarantee  an  eventually-separate 
pair  of  black-white  occurrences  at  the  mxcc  from  which  the  forward  step  was  taken, 
and  in  fact  at  any  node  on  a  path  from  die  beginning  of  the  tentative  scan.  Thus,  if 
the  tentative  scan  terminates  without  aborting,  we  check  to  see  if  there  is  now  an 
eventually-separate  black-white  pair.  If  so,  everything  in  the  tentative  scan  that  has 
such  pairs  is  included  in  if.  The  test  of  M  will  consist  of  pieces,  each  having  a  root 
msec  with  the  property  that  all  the  colors  of  a  piece  appear  in  die  root  mxcc.  (If 
none  of  the  tentative  scan  is  included  in  if,  the  root  mxcc  is  die  beginning  mxcc  of 
die  forward  scan.)  Because  of  the  tentative  scan,  any  merge  of  colon  erf  a  root  mxcc 
will  not  lead  to  a  latent  twist  in  the  tentative  part  of  M  forward  of  the  root  If  all 
the  colon  are  merged  to  black  or  all  to  white,  that  tentative  part  of  if  will  return  to 
its  unspanned  state.  If  some  become  white  and  some  black,  then  some  of  the 
tentative  part  will  be  included  in  M.  up  to  a  complete  split  or  to  mice’s  that  become 
roots  of  smaller  tentative  parts  of  M. 

Let  us  suppose  that  we  finish  a  tentative  scan  from  a  mxcc.  but  that  the  mxcc 
remains  tentatively  in  if.  Then  we  leave  everything  tentatively  in  if.  This  means 
that  another  forward  scan  may  find  a  mxcc  tentatively  in  if.  If  die  merges  «igwg  the 
arc  detect  a  latent  twist  the  forward  scan  is  aborted  (if  it  is  a  tentative  the 
tentative  part  is  aborted).  Otherwise,  the  merges  are  completed.  If  the  forward  scan 
was  in  a  nan-tentative  part,  it  may  continue.  If  the  forward  scan  was  tentative,  it 
need  not  go  beyond  this  point  Because  the  tentative  part  of  M  is  closed  in  the 
forward  direction,  a  backward  step  will  never  reach  a  mxcc  tentatively  in  M. 

After  M  is  closed  under  backward  and  (perhaps  tentative)  forward  steps,  we  have 
essentially  completed  its  construction.  It  remains  to  decide  what  to  do  with  the 
tenative  parts  of  if,  but  this  is  more  naturally  considered  in  the  next  section. 
Omitting  the  complicated  details  of  forward  scan,  we  now  summarize  this  section  by 


Algorithm  S.4  Construct  a  modification  subgraph  M 
Initialize  if  at  a  split 

Scan  forward  from  the  top  node  of  the  split 
for  A  «-  mxcc-arc  not  in  M  entering  a  non-source  mxcc  in  M 
m  «-  the  end  of  A  not  yet  in  M 
Include  A,m  in  M 
Scan  forward  from  m 

The  occurrences  in  each  mxcc  in  M  at  tentatively  so  are  colored.  Call  a  non-sink, 
non-source  an  interior  mice.  Each  interior  mice  has  occurrences  with  distinct 
colors.  An  interior  noo-tentative  mice  has  at  least  one  black  and  one  white 
occurrence.  Two  occurrences  belonging  to  interior  mice’s  and  connected  by  a 
cohabitation  arc  (inter-line  or  not)  have  the  same  color. 

A  final  issue  to  be  discussed  is  the  assignment  of  costs  to  the  arcs  of  M.  Each  arc 
A  of  M  corresponds  to  some  set  of  cohabitation  arcs.  For  arcs  in  a  strongly 
connected  component  of  if,  we  use  a  cost  of  infinity,  for  the  reasons  outlined  in 
section  8.1.  For  other  arcs,  we  use  the  minimum  of  the  costs  on  the  associated 
cohabitation  arcs,  following  oar  usual  philosophy  of  optimism  in  choosing  costs. 
Since  all  the  cohabitation  arcs  correspond  to  the  same  arc  in  the  flowgraph,  it  is 
likely  that  the  costs  will  all  be  the  same.  This  breaks  down  only  when  the  variables 
in  a  cohabitation  class  are  asymmetric  in  some  respect  For  example,  in  inter-region 
compilation,  some  of  the  variables  may  be  asmmed  to  be  in  registers,  and  others  not 
The  only  reason  that  two  such  variables  are  considered  to  be  cohabiting  is  the 
technique  we  have  been  using  to  resolve  an  inconsistency. 

Our  construction  of  M  systematically  excludes  one  type  of  cohabitation  arc:  an 
intra-tine  cohabitation  arc  at  any  source  node.  During  the  construction  of  a  kernel 
region,  such  arcs  may  be  given  very  low  costs  because  we  have  some  trick  in  mind  for 
breaking  the  arc;  thus  we  really  must  include  this  information  in  M.  To  do  so,  we 
can  add  a  new  source  to  M  and  a  new  arc  leading  from  it  the  the  old  source,  where 
the  cost  on  the  new  arc  is  that  of  the  intra-line  cohabitation.  The  old  source  is  then 
relabeled  as  a  non-source.  In  this  way,  U  can  be  made  to  reflect  the  information 
about  inexpensive  intra-line  cohabitations.  (To  simplify  exposition,  tins  matter  was 
not  mentioned  in  section  8.1). 

As  an  example  of  the  techniques  of  this  section,  we  consider  the  construction  of  a 


2/2 


flD-A15i  549  THE  DEVELOPMENT  OF  A  PROGRAMMING  SUPPORT  SVSTEM  FOR 

RAPID  PROTOTVPING  TASKS  2  AND  3<U>  SOFTWARE  OPTIONS  INC 
CAMBRIDGE  MA  JAN  85  SO-01-85  N00014-82-C-0172 

F/G  9/2 


UNCLASSIFIED 


NL 


modification  subgraph  for  the  split-removal  problem  generated  by  the  NP-hardneas 
proof  at  8 2.  Choosing  a  split  in  the  mace-graph  corresponds  to  choosing  an  arc  in 
tiie  graph  that  was  to  be  colored.  Recall  that  the  nodes  of  this  graph  correspond  to 
variables  in  the  split-removal  problem.  The  construction  for  M  will  assign  the  same 
color  to  every  occurrence  of  the  same  variable,  and  different  colon  to  occurrences  of 
different  variables.  There  will  be  one  other  sink  besides  the  initial  one.  There  is  no 
tentative  part  of  if,  because  each  of  the  ones  leaving  the  loop  corresponds  to  a 
(non-exact)  split  where  each  mxcc-arc  has  only  one  occurrence.  After  shrinking  arcs 
with  infinite  cost  (those  in  the  loop).  M  will  have  the  form: 


ainka 


&5  The  General  Cue  of  Split-Removal 

In  tiie  previous  section,  we  saw  how  to  construct  a  modification  subgraph,  and 
obtain  an  assignment  of  ootp—  to  its  occurrences  that  is  prrsfrvrd  cohabitation 
arcs.  It  the  resulting  construction  has  only  the  colon  blade  and  while,  then  we  have 
a  situation  like  that  of  8.1,  even  though  we  did  not  begin  with  two  variables.  We  use 
the  techniques  of  that  section  to  inert  move  instructions  that  remove  the  splits 
between  black  and  white.  Inserting  these  involves  modifying  the  cohabitation 
relations,  and  thus  the  mxcc-sets.  After  regrowing  msec-sets,  there  may  still  be 
remaining  spills,  unlike  the  two  variable  case.  If  so,  a  new  modification  subgraph  is 
constructed,  and  the  process  is  repeated.  By  the  remarks  made  at  the  beginning  of 
the  previous  section,  the  number  of  times  this  process  can  be  repeated  is  limited  by 
the  total  size  at  all  splits  in  the  original  msec-graph,  and  is  expected  to  be  smell, 

The  real  purpose  of  this  section  is  to  discuss  the  case  in  which  there  are  colon 
other  than  black  and  white.  We  first  discuss  the  case  where  if  has  no  tentative  parts. 
We  will  still  insert  move  instructions  on  a  minimum  cut  of  the  modification  subgraph; 
all  mice's  below  this  point  wfll  be  divided  into  two  mice's  one  mxcc  having  no 
black  occurrences,  and  one  having  no  white  occurrences.  Occurrences  with  other 


colon  will  be  in  one  macc  or  another.  This  most  of  course  be  done  so  that  the 
cohabitation  arcs  are  consistent  with  the  mzcc-arcs,  bat  there  is  still  some  choice  for 
how  to  do  this.  Suppose  that  after  insertion  of  move  instructions,  a  red  occurrence  is 
in  the  node  as  a  black  occurrence  at  some  mice.  By  following  cohabitation 
arcs  forward  and  backward,  we  will  see  that  red  occurrences  most  always  be  in  the 
same  mxcc  as  a  black  occurrence.  Thus,  we  may  view  red  as  being  merged  into 
black.  Convewely,  we  may  choose  to  merge  any  non- white  occurrence  into  Mack,  or 
vice  vena,  and  obtain  a  valid  msec-graph  after  insertion  of  move  instructions. 

The  of  «***g<wg  a  odor  with  Mack  or  with  white  will  clearly  lead  to 

different  msec-graphs  after  insertion  of  move  instruction.  One  choice  is  likely  to 
lead  to  a  mote  optimal  overall  solution  to  split-remoral  than  another.  How  can  we 
figure  out  what  to  do?  We  can  offer  only  a  heuristic  and  a  plausibility  argument 
Before  doing  so.  we  obaerve  that  file  construction  of  the  modification  subgraph 
produces  as  many  distinct  colors  as  possible,  within  file  constraint  that  colors  follow 
cohabitation  arcs.  Once  a  Cut  is  chosen,  we  are  worried  only  about 

cohabitations  below  file  cut  and  are  not  constrained  by  any  merges  of  colon  that 
occurred  became  of  cohabitations  above  fin  cut  To  give  the  heuristic  mariimnn 
flexibility,  the  effect  of  these  merges  Is  undone,  perhaps  yielding  even  mace  distinct 
colon. 

We  are  guided  by  file  idee  of  coloring  file  eveatually-separate  relation.  Assume 
that  at  some  mace,  each  of  the  occurrences  has  a  distinct  color.  Momentarily 
forgetting  about  then  colon,  tue  different  colon  to  (minimally)  color  the  eventually- 
seperate  relation  on  file  mxcc  (Ignoring  self-loops).  If  the  chromatic  number  is  less 
than  the  number  of  nodes,  then  two  nodes  have  the  same  odor.  Remembering  again 
the  original  set  of  colon,  we  observe  that  the  Mack  and  the  white  occurrence  are 
eventuaUy-sepurate  (by  construction  of  JM)»  and  so  will  receive  different  (new)  colon. 
The  baric  heuristic  is  thfc 

If  two  occurrences  receive  the  same  mw  color,  then  merge  file 

corresponding  orifbttt  colon. 

This  will  never  merge  Mack  with  white.  The  plauriMUty  argument  for  the  heuristic 
involves  the  consideration  of  what  happens  after  growing  mxcc-sets  in  the  new 
mace-graph.  If  the  eveatually-separate  relation  is  divided  into  two  pieces  in  fids  way, 
then  the  sum  of  the  chromatic  numbers  of  the  pieces  will  be  equal  to  the  chromatic 


lUel 

1  l  I  I  1 


number  of  the  original.  In  other  words,  the  heuristic  insures  that  the  number  of 
cohabitation  classes  does  not  fn**— as  it  might  for  some  other  division  of  the 
eventually-separate  relation. 


It  may  of  course  happen  that  several  of  the  occurrences  in  a  msec  already  have 
the  otne  odor,  either  because  of  a  cohabitation  in  if,  or  because  they  were  merged 
together  by  dm  heuristic.  We  do  not  renege  on  making  these  colon  identical,  rather, 
whether  any  further  merging  is  allowed  by  the  eventually-separate  relation, 
ferendy,  form  an  induced  relation  on  the  original  colors  at  a  msec,  defining 
lots  to  be  eventually  separate  if  any  two  occurrences  with  those  respective 
ire  eventually-separate.  We  may  color  the  induced  relation  with  new  colors. 
1  apply  the  same  heuristic. 

It  is  necessary  to  apply  the  heuristic  at  most  once  cm  each  node.  After  having 
done  so.  the  induced  relation  will  be  a  complete  graph  and  will  remain  so  after 
subsequent  merges  of  colors.  A  complete  graph  has  a  chromatic  number  equal  to  die 
number  dt  nodes;  applying  dm  heuristic  step  will  not  merge  any  colors.  Conversely, 
an  induced  relation  that  is  not  a  complete  graph  has  a  chromatic  number  less  than 
the  number  of  nodes,  so  applying  the  heuristic  will  cause  a  merge  of  colon. 

After  dm  heuristic  k  applied  at  each  node,  we  know  that  the  induced  relation  is  a 
complete  graph  everywhere,  but  there  is  still  the  possibility  that  there  are  non-white, 
noo-black  colon.  How  should  these  he  merged  with  black  and  white?  It  is  difficult 
to  formulate  a  further  rule  on  dm  bash  of  what  is  seen  in  Jf,  partly  because  a 
complete  graph  k  symmetric  oq  dm  node  set  Experience  may  suggest  further 
heuristics,  but  smallness  of  msec-sets  indicates  that  dm  proposed  heuristics  result  in 
only  two  colon  in  almost  every  practical  case.  A  first  implementation  can  choose  the 
remaining  merges  arbitrarily. 

We  now  turn  to  dm  problem  of  deciding  what  to  do  with  the  tentative  parts  of  M. 
We  use  exactly  the  same  idea  of  finding  a  minimal  coloring  and  using  it  to  induce 
merges.  Given  a  root  nucc,  it  would  seem  reasonable  to  choose  a  msec  leading 
immediately  to  a  root  msec  as  a  place  to  start  As  before,  this  is  only  a  heuristic,  and 
is  not  guaranteed  to  eliminate  all  the  tentative  parts  of  Jf.  Remaining  choices  can  be 
made  arbitrarily. 


It  is  interesting  to  apply  this  heuristic  to  the  split-remoral  problem  generated  by 
NP-hardness  proof  of  section  8.2.  The  eventually-separate  relation  for  nodes  in  the 
loop  will  be  the  graph  that  was  to  be  colored.  The  modification  sub-graph  will  assign 
the  distinct  colors  to  each  node,  among  them  Mack  and  white;  the  heuristic  will 
nmrge  the  colon  of  occurrences  that  are  colored  the  same.  Regardless  of  how  we 
group  the  merged  colon  with  black  and  white,  we  get  two  graphs,  the  sum  of  whose 
chromatic  numben  equals  the  original  The  split-removals  that  remain  after  the  first 
split-removal  will  continue  to  obey  an  optimal  coloring  of  the  split-removal  problem. 
Thus,  the  heuristic  optimally  solves  the  original  split-removal  problem,  assuming  that 
we  can  minimally  color  a  graph. 


97 

9.  Untwisting 

9ll  Two  Variable  Untwisting 

We  saw  in  the  previous  chapter  how  to  remove  splits.  If  there  is  any  remaining 
inconsistency,  we  know  by  Theorem  7.5  that  twists  account  for  it  The  simple* 
wwipfe  is  the  that  we  discussed  earlier.  In  this  section  we  will 

mmirifT  die  problem  of  optimally  untwisting  a  mace-graph  with  only  two  variables 
involved.  Let  us  use  if?  to  denote  the  subgraph  of  if*  induced  by  the  subset  of  nodes 
that  have  exactly  n  mace's  (/» »  1  or  2  in  this  section).  There  will  be  arcs  from  to 
if]  (them  are  in  neither  subgraph),  but  because  there  are  no  splits,  there  will  be  no 
arcs  from  h\  to  iff.  By  definition,  a  twist  has  an  associated  simple  undirected  cycle. 
We  first  concern  ourselves  with  the  case  in  which  the  cycle  is  in  iff.  As  we  shall  see, 
these  are  die  incoanstendes  that  are  resolved  by  exchanges.  Our  approach  is  to  first 
find  my  set  of  exchanges  that  will  work;  these  will  be  placed  on  arcs  of  Mm  so  we 
«eed  merely  identify  die  subset  of  arcs.  This  is  done  by  filing  die  ordering  of  the 
pair  of  max-sets  on  some  node  of  Mm  propagating  this  ordering,  and  marking  arcs 
where  trouble  occurs.  A  bit  more  formally,  one  can  use  the  following 
depthrfixtteeaich  algorithm. 

Algorithm  9.1  Order  mxcc-sets  if  j  Jf2  on  node  N 

Mark  node  N  as  "seen" 

Order  Mi  before  J#2  on  AT 
fer  A «-  each  iff  arc  leaving  AT 
Af ' «-  the  other  end  of  AT 

MJ «-  node  from  traversing  mxcc-arc  from  Mt  along  A,  i  *  1  and  2 
If  is  seen 

If  M{  is  after  M{  on  N'  then  mark  A 


Order  mxcc-sets  M{.  Jf2  on  node  N' 

This  subroutine  is  used  in: 

Algorithm  9.2  Mark  exchange  arcs 

BridaUae  nodes  of  iff  m  not  saen,  sics  as  not  marked 
fer  AT*  each  node  of  2?f 
if  AMs  not  seen 

Mi,  Mt  v-  dm  two  mxcc-sets  for  AT 
Older  mxcc-sets  Jfj,  Jf2  on  AT 

As  we  hinted  earlier,  the  point  of  this  is: 


Thtona  9.1  If  instructions  are  placed  on  the  arcs  marked  by  the  a  bore 

algorithm,  the  cycles  for  any  remaining  twists  go  through  nodes  in  if]. 

Free!  We  observe  first  that  if  the  above  algorithm  marks  no  arcs,  there  is  no  twist  in 
if?.  In  this  situation,  a  mxcc-path  never  goes  from  the  first  mice  on  one  node  to  a 
on  the  next;  thus  it  can  never  return  to  a  different  mice  on  the  same 

node. 

Next,  we  — the  effect  of  placing  an  exchange  on  a  marked  arc  of  the 
msec-graph.  An  exchange  instruction  requires  four  occurrences  in  order  to  be 
properly  represented  by  the  cohabitation  relation:  two  last  uses  (one  for  each 
variable)  and  two  generations  (one  for  each  variable).  Each  last  use  is  required  to 
cohabit  with  the  generation  of  the  other  name— this  is  the  semantics  of  exchange.  In 
the  before  and  after  pictures  below,  the  exchange  instruction  involves  occurrences 
with  subscripts  3  and  4: 


If  die  algorithm  is  run  with  the  modified  mxcc-graph,  no  arcs  of  if?  will  be  marked, 
and  by  the  earlier  remark,  no  twist  will  lie  entirely  in  Hi.  □ 

We  consider  twists  that  do  not  lie  entirely  in  Hi.  It  is  impossible  for  a  twist  to  lie 
entirely  in  Hi  because  there  are  not  distinct  mice's  at  any  node,  by  definition  of  h\. 
Any  twist  thus  crosses  the  boundary  from  if?  to  if]  with  the  Hr* rc  in  that  direction. 
Since  the  algorithm  for  growing  a  mxcc-set  has  been  applied,  the  only  way  that  the 
number  of  mice's  can  decrease  is  if  one  of  them  dies  (Le.  all  the  occurrences  in  the 
msec  die).  Hence,  there  is  only  one  mxcc-arc  along  this  arc  of  if*  and  either  it 
leaves  from  the  first  mxcc  on  the  if?  node,  in  which  case  we  call  it  a  type-1  arc,  or 
from  the  second  mxcc,  where  it  is  a  type-2  arc.  This  partitions  the  boundary  arcs 
between  if?  and  ifi. 


99 

The  idea  for  identifying  the  remaining  twists  is  to  propagate  the  type  from  a 
boundary  arc  along  forward  paths,  (these  necessarily  remain  in  h\),  marking  arcs 
where  this  cannot  be  done  consistently.  First,  we  present  the  depth-flrst-iearch  part, 
then  the  initiator. 

Algerttlun  9.3  Propagate  type  /  along  A 

N+-  node  pointed  to  by  if 
if  JV  has  a  type 

If  the  type  of  N  is  not  /then  Mark  A 


Make  Nby  type  / 
for  A*  *-  each  arc  leaving  N 
Propagate  type  /  along  A 

Algorithm  9.4  Mark  move  arcs 

Initialize  nodes  of  Hi  to  have  no  type 
for  N+  each  node  of  ffj 

It  A*-  each  boundary  arc  leaving  N 
i  die  type  of  A 
Propagate  type  /  along  A 

The  name  of  die  latter  algorithm  anticipates  the  following  result 

Theorem  9J  If  move  instructions  are  inserted  on  the  arcs  marked  by  die  above 
algorithm,  there  are  no  twists  through  nodes  of  u\. 

Proof  As  with  the  previous  result,  we  first  note  that  if  the  algorithm  marks  no  arcs, 
there  is  no  twist;  any  mace-path  from  J7?  into  Hi  and  back  stays  an  the  same  type  of 
mace  (first  or  second).  Then,  we  look  at  the  effect  of  a  move  instruction  on  the 
msec-graph. 


CV,1 


(Vi) 


i 


tv,) 
tv,} 
T«VJ 

5(V2) 


In  other  words,  the  arc  in  if.  is  effectively  removed,  so  there  is  not  the  possibility  of 
it 


The  algorithms  and  theorems  of  this  section  have  shown  how  to  remove  twists  by 
the  insertion  of  nchsnges  and  moves.  So  far,  we  have  paid  no  attention  to  how  to 
do  dm  optimally,  in  order  to  puisne  dds  question,  we  will  investigate  dm  family  of 


EXCH  V.U 


EXCH  V.U 


EXCH  V.U 


EXCH  V.U 


The  code  of  node  N  will  fiod  V  and  U  stored  in  opposite  pieces  in  the  two  cases,  but 
as  long  m  tins  Is  it  is  dear  dial  the  left  side  is  correct  if  and  only  if  die  right 
ride  is.  Although  this  pietare  is  drawn  with  nodes  and  arcs  of  Bl  in  mind,  it  is 
actnally  correct  if  some  of  die  arcs  are  boundary  arcs,  or  if  N  is  in  if]  and  the  arcs 
are  boandary  area  on  h!  area.  We  make  die  convention  diet  the  exchange  swaps  the 
ftnuwif  of  die  two  memory  locations  corresponding  to  the  two  mice's  and  optimize 
die  exchange  to  a  move  when  it  appears  on  a  noo-2f?-irc. 

The  t-rir  observation  is  even  simpler  to  understand:  two  consecutive 

whwtgv  reduce  to  nothing.  Two  moves  along  a  non-//2-arc  also  cancel.  No 
pictures  are  necessary  to  illustrate  this. 

We  can  combine  the  two  observations  into  a  single  operation.  Recall  that  the 
algorithms  certain  arcs,  upon  which  either  ezchangei  or  moves  are  inserted, 

depending  an  membership  in  if?.  Pick  a  node  N.  Suppose  we  increase  the  number 
of  masks  an  each  node  by  one  (following  the  first  observation),  and  take  all  mark* 
off  of  a  doubly  marked  arc  (following  the  second  observation).  This  amounts  to 
the  marks  on  the  arcs  incident  to  N,  and  leaves  us  with  a  marking 
that  will  resolve  the  inconsistency  (once  exchanges  and  moves  are  inserted).  Further, 
this  operation  completely  disregards  direction  in  Hm  leading  us  to  the  following 
nomenclature  (analogous  to  that  for  Petri-nets). 

Definition  An  widirtcnd  marked  graph  is  an  undirected  graph,  together  with  a 


* 


function  from  the  arcs  to  (0,1}  (the  marking).  To  fire  a  node  is  to  complement  the 
▼aloe  of  the  function  on  arcs  touching  it  □ 

Theorem  Given  a  split-free  ms-partition  Hm  an  underlying  mxcc-graph,  and  a 
mtrtmg  of  the  arcs  by  the  above  algorithms  of  this  section.  Any  other  marking  of 
die  arcs  wffl  remove  the  splits  if  and  only  if  it  can  be  obtained  from  the  first  by 
Bring  some  sequence  of  nodes. 

Preef  We  have  seen  above  that  any  sequence  of  firings  leads  to  a  marking  which  will 
remove  the  twists.  Conversely,  for  any  marking  and  undirected  cycle  in  let  the 
parity  of  the  cycle  be  the  parity  of  the  number  of  marks  along  arcs  in  the  cycle.  We 
riwni  that  for  any  cycle,  die  parity  of  markings  that  remove  a  twist  is  the 
same-it  is  one  if  die  cycle  gives  rise  to  a  twist,  and  zero  otherwise.  Thus,  the  parity 
of  die  of  two  such  markings  is  zero.  Since  this  holds  for  every  cycle,  we  can 
two-color  die  nodes  at  Bm  so  that  nodes  connected  by  an  arc  with  no  mark  from 
either  marking,  or  marks  from  are  die  same  color,  while  nodes  connected  by 
arcs  with  precisely  one  kind  of  arc  are  different  colon. 

Now,  start  with  one  of  the  markings,  and  fire  all  die  nodes  of  a  given  color.  The 
order  is  irrelevant;  arcs  with  no  marks  or  both  marks  stay  the  same,  while  arcs  with 
precisely  one  kind  of  mark  receive  the  other  kind  of  mark.  Hence;  any  marking  that 
removes  twists  can  be  reached  by  firing  nodes,  starting  with  any  other  such  marking. 
□ 

Given  a  frequency  on  an  arc,  we  can  calculate  how  much  it  would  cost  to  place  an 
there  (see  section  2.4).  This  defines  the  cost  of  an  arc  of  H+  Starting  with 
this  cast,  we  have  die  further 

Dsffcritloa  Given  an  undirected  marked  graph  with  costs  on  its  edges,  the  cost  of  a 
marking  Is  the  sum  of  the  costs  of  marked  edges.  Given  a  marking,  a  minimal 
egnhalent  marking  is  one  that  can  be  reached  from  the  original  marking  by  firing 
nodes,  and  has  cost  no  greater  than  any  other  such  marking, 

□ 

The  nezt  section  will  consider  the  problem  of  finding  a  minimal  equivalent  marking. 
We  summarize  this  section  by 


Algorithm  9.5  Optimally  remove  twists 

Mark  exchange  arcs  of  H» 

Mark  move  arcs  of  H, 

Viewing  if*  as  a  undirected  graph  find  a  minimal  equivalent  marking 

Insert  exchanges  and  moves  according  to  the  minimal  marking 

9l2  Undirected  Marked  Graphs 

In  the  previous  section,  we  reduced  the  problem  of  optimal  untwisting  in  the  two 
variable  ewe  to  an  optimization  problem  on  undirected  marked  graphs.  In  this 
Motion  tre  show  that  this  problem  is  NP-hard  in  general,  but  that  the  problems 
arising  in  practice  are  likely  to  be  easily  solved.  We  begin  with  a  simple  observation 
used  in  both  the  negative  and  positive  results. 

Lama  9.1  Let  G  be  a  undirected  marked  graph,  with  a  set  of  nodes  II.  Let  F  be 
any  subeet  of  nodes.  Then  firing  /and  firing  H  -  F lead  to  the  same  marking, 

Free/  The  marking  on  an  arc  only  when  one  of  its  ends  is  in  the  firing  set 

and  one  la  not  This  property  is  invariant  when  F  is  replaced  by  II  -  F. 

□ 

The  proof  of  die  NP-hardneas  result  is  by  reduction  from  3-cokxabOity  (  [3]).  The 
town*  below  the  gadget  that  will  be  used  for  each  of  the  nodes  of  die 

graph  to  be  3-colored. 

Lemma  9.2  the  "tetrahedral”  undirected  marked  graph: 


Assume  that  the  center  node  does  not  fire.  Then  a  minimal  equivalent  marking  can 
be  obtained  by  firing  any  pair  of  other  nodes,  and  only  by  firing  a  pair. 

Prwf  the  graph  is  symmetric  on  the  outer  three  nodes  when  we  consider  the 
frwi—  node  distinguished  (unfired),  we  just  investigate  what  happens  when  firing  0. 1. 
X  or  3  nodes.  The  picture  for  0  fired  nodes  is  unchanged  from  the  above.  The 
other  three  pictures  are: 


» 


•— - — • 

one  node  fired  two  nodes  fired  three  nodes  fired 

The  middle  picture  has  a  coat  of  two.  the  others  either  three  or  six. 

□ 

Next,  we  consider  the  gadget  that  will  be  used  for  each  of  the  arcs  in  the  graph  to  be 
3-cokxed. 


Lemma  9.3  In  the  undirected  marked  graph  below,  assume  that  a  pair  of  nodes  on 
the  left  fire,  and  a  pair  of  nodes  on  the  right  fire.  Subject  to  this  condition,  a 
minimal  equivalent  marking  is  obtained  only  if  the  fired  pain  do  not  correspond 
horizontally. 

• - 0 - • 

• - © - - 

• - © - • 

Proof  By  symmetry,  there  are  essentially  only  two  cases,  when  the  fired  pairs 
correspond,  and  when  they  do  not 

V - © - \  yr* 

fired  v.  fired  fired  <1 

• - © - - 

□ 

Using  these  two  types  of  gadgets,  we  obtain 

Theorem  9.4  The  problem  of  finding  a  minimal  equivalent  marking  is  NP-hard, 
even  with  uniform  weights  on  the  arcs. 

Proof  Given  a  graph  to  be  3-colored,  form  a  node  gadget  for  each  of  its  nodes,  as  in 
Lemma  92,  all  sharing  a  common  central  node.  Label  the  other  three  nodes  with 
red,  blue,  and  yellow.  Where  nodes  are  adjacent  in  the  graph  to  be  3-colored, 
connect  the  red-red,  blue-blue,  and  yellow-yellow  pairs  of  the  corresponding  node 
gadgets  with  the  arc  gadget  of  Lemma  9.3. 


By  l^anma  9.1,  we  may 


that  the  common  central  node  of  the  node  gadgets 


does  not  (ire.  We  wish  to  arrange  weights  so  that  in  any  minimal  equivalent 
marking,  a  pair  of  nodes  in  esc*  node  gadget  will  fire.  Thus,  we  make  the  cost  of 
arcs  of  node  gadgets  high  compered  to  the  cost  arcs  of  the  arc  gadgets  (red-red,  ...). 
Specifically,  if  there  are  a  arcs  in  the  graph  to  be  3-colored,  let  the  cost  of  a  node 
gadget  arc  be  3a+l,  and  the  cost  of  the  other  arcs  be  1.  Suppose  that  there  are  » 
nodes  in  the  graph  to  be  3-colored.  By  Lemma  9X  the  cost  of  die  marks  in  node 
gadget  arcs  alone  will  be  2r-(3a+l),  and  this  will  be  achieved  only  when  a  pair  from 
eve**  node  gadget  is  fired.  Assume  that  a  firing  set  does  not  fire  a  pair  from  each 
node  gadget  By  changing  the  firing  on  that  node  gadget  alone,  we  can  reduce  the 
cost  by  3a+l,  and  at  worst,  increase  the  cost  on  arc  gadget  arcs  by  3a,  a  net 
reduction  of  at  least  one,  In  short,  the  cost  of  a  minimal  equivalent  marking, 
restricted  to  node  gadget  arcs,  is  fi«ed  at  2*-(3a+l),  and  we  worry  about  the  "excess 
cost",  which  is  between  a  and  3a. 

Suppose  we  find  a  wmhmmI  equivalent  This  chooses  a  pair  of  colors  from 

each  node  gadget  Mix  these  colors  together,  and  assign  this  color  to  the  graph  to  be 
3-colored,  Le..  red  and  blue  yield  purple,  etc.  If  the  excess  cost  is  exactly  a,  then  by 
tnniM  9.3,  the  graph  to  be  3-colored  has  been  successfully  3-colored.  On  the  other 
hand,  if  the  original  graph  can  be  3-colored,  say  with  orange,  green,  and  purple,  the 
colon  may  be  put  through  a  prism  to  obtain  fired  pain  an  the  node  gadgets,  with  a 
total  cost  of  exactly  a.  Thus,  die  original  graph  is  3-colorable  if  and  only  if 
the  Minimal  equivalent  marking  in  the  derived  graph  has  cost  2»-(3a+l)+«.  This 
proves  that  finding  a  minimal  equivalent  marking  is  NP-hard. 

It  remains  to  be  shown  that  we  can  make  the  construction  using  uniform  weights. 
The  point  is  that  we  can  replace  each  node  gadget  arc  with  3a+l  (marked)  arcs,  and 
obtain  precisely  the  behavior  and  still  have  an  undirected  marked  graph  whose 
sire  is  polynomial  in  the  sire  of  the  graph  to  be  3-colored. 

□ 

Observe  that  in  the  construction  used  to  prove  NP-hardness,  every  single  arc  is 
Recalling  the  construction  of  the  undirected  marked  graphs  used  in 
untwisting  (section  9.1),  it  seems  likely  diet  comparatively  few  area  will  be  marked. 
The  rest  of  this  section  concerns  a  technique  that  works  well  with  few  marked  area. 
We  begin  with  a  result  that  allows  us  to  decompose  the  problem  to  a  certain  extent 


105 


Theorem  9.5  Let  G  be  an  undirected  marked  graph.  The  coat  of  a  minimal 
equivalent  mowing  is  equal  to  the  sum  of  the  costs  of  minimal  equivalent  markings  of 
the  components. 

Proof  We  use  inAy»*inn  on  the  number  of  articulation  points.  If  there  are  no 
articulation  points,  either  G  is  biconnected,  in  which  case  the  result  is  a  tautology,  or 
else  G  consists  of  a  single  arc  touching  two  nodes.  If  this  arc  is  not  marked,  it  is  its 
own  minimal  equivalent  marking.  if  it  is  marked,  fire  one  of  die  nodes  to  get  an 
equivalent  marking  with  cost  zero,  and  therefore  minimal.  This  proves  the  result 
when  the  number  of  articulation  points  is  zero. 

Assume  G  has  an  articulation  point  n.  Then  G  is  the  union  of  some  number  of 
subgraphs  G^G^  whose  pairwise  intersection  consists  only  of  the  node  n.  Each  G( 
will  have  fewer  articulation  points  than  G,  so  we  may  apply  the  result  inductively  to 
each  of  them.  Let  be  the  firing  sets  for  each  of  the  G(  that  achieve  a  minimal 
equivalent  marking  By  the  Lemma  9.1.  we  may  assume  that  F{  does  not  contain  the 
node  a.  This  proves  that  die  sum  of  die  costs  of  minimal  equivalent  markings  is 
greater  dun  or  equal  to  the  cost  of  a  minimal  equivalent  marking  of  G.  On  the  other 
hand,  let  Pbe  a  firing  set  that  achieves  minimal  cost  far  G.  Restricted  to  Pleads 
to  an  equivalent  marking  for  G^  so  the  cost  of  a  minimal  equivalent  marking  for  (7  is 
greater  than  an  equal  to  die  wane  of  the  coats  of  minimal  equivalent  markings. 
Equality  is  established. 

□ 

Since  biconnected  components  can  be  computed  In  linear  time  (see  [5]),  this  is  a 
useful  reduction.  In  fact,  this  can  be  done  at  the  same  time  that  the  area  are  marked 
(see  9.1).  because  both  are  depth-first-search  algorithms. 

As  we  shall  see,  the  techniques  of  flows  in  networks  are  useful  in  looking  for  a 
minimal  equivalent  marking  The  elastic  reference  is  [lfc  we  introduce  some 
standard  terminology. 

Definition  Let  F  be  a  subset  of  the  nodes  of  G.  The  boundary  at  F,  denoted  BF.  is 
the  set  of  arcs  with  one  end  in  F  and  the  other  in  It  -  P.  A  car  is  a  set  of  arcs  that 
is  the  boundary  of  some  set  of  nodes. 

□ 

hi  network  flow  theory,  numbers  asrigned  to  arcs  are  called  capacities,  because  of  the 


,  <  •-  r. 

>  v  r 


. 


■  v. 


original  application  of  the  theory.  In  the  present  context,  they  are  costs,  and  we  will 
continue  to  call  them  that  We  shall  also  extend  the  meaning  of  the  term: 

Deflnitioe  The  cor/  of  a  set  of  arcs  (typically  a  cut),  is  the  sum  of  the  costs  of  its 
elements.  Given  a  marked  graph  G,  the  cost  of  a  set  of  nodes  F  is  the  cost  of  the  set 
of  marked  arcs  after  F  is  fired. 

□ 

The  following  result  is  the  connection  to  network  flows. 

Theorem  9.6  Let  G  be  an  undirected  marked  graph  and  H  the  subgraph  subtended 
by  unmarked  arcs.  Let  F  be  a  firing  set  achieving  a  minimal  marking.  For  every 
marked  arc  in  G.  label  an  end  as  either  a  source  or  a  sink,  according  to  whether  the 
endpoint  is  or  is  not  in  F.  Let  C  be  a  minimum  cut  of  if  between  sources  and  sinks, 
and  let  5  be  the  set  of  arcs  incident  upon  two  sources  or  two  sinks.  Then 
oost(.F)  »  cost (5)  +  cost(C) 

Proof  Let  C  be  a  cut  set,  and  let  F*  be  a  set  of  nodes  such  that  a F*  *  C  (without  loss 
of  generality  F*  will  contain  all  of  the  sources  and  none  of  the  sinks).  Fire  all  the 
nodes  of  F*.  This  will  remove  all  marks  on  arcs  incident  upon  a  source  and  a  sink, 
and  will  put  marks  only  an  the  arcs  in  C  Thus 

cost (F)  £  cost(F0  *  cost(S)  +  cost (C) 

The  inequality  is  by  the  minimality  of  cost (F). 

Conversely,  given  F,  let  C*  *  9F.  so  that  any  petit  in  H  from  a  source  to  a  sink 
must  include  an  arc  of  C.  Firing  F  adds  marks  only  to  arcs  of  C.  Thus 

cases')  +  cost (C)  £  cost(£)  +  cost(C0  *  cost(F) 

The  inequality  is  by  minimality  of  cost(C),  and  the  result  follows. 

□ 

A  minimal  equivalent  marking  can  trivially  be  found  in  time  exponential  in  the 
namher  of  nodes  of  <7.  Because  of  network  flow  theory,  minimum  cuts  can  be  found 
in  polynomial  time,  so  the  above  result  means  that  we  can  find  a  minimal  equivalent 
marking  in  time  exponential  in  the  number  of  marked  arcs  of  G.  In  practice;  tills  is 
probably  good  enough,  since  this  number  is  most  Ukety  one  or  two.  A  number  of 
heuristics  can  be  devised,  based  on  consecutive  applications  of  max-flow-mm-cut  One 
of  foe  simplest  is  the  following,  which  uses  a  greedy  approach  to  orienting  marked 


Algoritlui  9.6  Approximate  minimal  equivalent  marking 

All  imrkmA  arcs  ire  initially  unoriented. 

far  A «-  marked  arcs  in  order  of  decreasing  cost 
ctywa  an  orientation  for  if  to  minimize  min-cnt 
If  cost  of  min-cot  <  cost  of  oriented  arcs 
use  cut  to  achieve  lower  cost  marking 
restart  the  algorithm 

93  Pemnfatioi  Labeled  Graphs 

We  have  considered  the  problem  of  untwisting  when  there  are  only  two  variables. 
The  actually  apply  to  the  mote  general  case  in  which  each  node  of  H»  has 

at  most  two  msecs.  This  and  the  next  section  consider  the  case  in  which  a  node  of 
Hm  has  any  number  of  mxcCs.  (We  always  assume  that  splits  have  been  removed 
from  Hm)  We  know  that  the  two  variable  case  is  hard  (Theorem  9.4),  so  the  general 
case  must  rely  on  heuristics  as  well.  However,  the  added  difficulties  of  tins  case  do 
not  present  any  real  discouragement,  not  because  they  are  easy  to  handle,  but 
because  of  what  must  be  their  great  rarity.  The  conditional  exchange  is  a  reasonably 
natural  example  of  how  a  twist  can  arise  from  a  real  program.  I  know  of  no  similariy 
natural  example  involving  three  or  more  variables.  The  best  example  I  can  devise  is  a 
sort  program  that  works  on  three  scalar  variables,  written  so  that  a  value  is  not  moved 
until  its  precise  point  in  the  ordering  is  known.  Such  a  program  would  present 
three-variable  twisting;  why  anyone  would  write  such  a  program  is  not  clear. 

Nevertheless,  for  completencw,  we  consider  general  untwisting.  Recall  the 
notation  Hi,  denoting  the  subgraph  of  H»  induced  by  nodes  having  exactly  n  mice's, 
fit  this  section,  we  will  assume  that  H  *  Hi,  Le*  every  node  of  H»  has  exactly  n 
msecs.  This  restriction  is  removed  in  the  next  section.  The  first  task  is  to  order 
msec-sets  on  nodes  of  H»  and  to  make  Ah  ordering  obey  cohabitations  along  arcs  of 
Hm  aa  far  as  possible.  When  this  became  impossible  in  the  two-variable  case,  we 
******  the  arc.  Here,  the  mark  must  cany  more  information;  to  specify  it  exactly  is 
to  give  the  permutation  on  a  elements  which  corresponds  to  the  map  induced  by  the 
cohabitation  arcs  along  the  arc  of  Hm  From  this  point  of  view,  the  unmarked  area  of 
the  two  variable  caae  are  labeled  with  the  identity  permutation;  the  marked  area  are 
given  the  only  other  permutation  on  two  elements,  namely,  a  transposition.  To  make 
titia  explicit  in  several  variables,  consider  an  algorithm  like  9.1,  and  let  ns  look  at 


what  might  happen  «i<™g  an  arc  to  an  already  seen  node,  Le.,  one  at  which  the 
ordering  of  mice  sets  is  fixed.  For  n  -  3,  there  are  essentially  two  cases: 


Representing  die  permutations  as  products  of  cycles,  and  numbering  the  mxcc-sets 
from  left  to  right,  we  have  die  permutations  (2  3)  and  (1  3  2)— here  each  product 
has  only  one  non-trivial  cycle.  In  the  product  of  cycles  representation,  the  identity 
permutation  can  be  represented  by  a  null  product,  in  keeping  with  the  unmarked 
nature  of  arcs  with  this  permutation.  We  will  exploit  the  fact  that  if  the  cycles  are 
disjoint,  then  die  representation  is  unique,  up  to  ordering  in  the  product  and  cyclic 
permutations  in  each  cycle  (  PI  Theorem  5.1.1). 

In  the  two  variable  case,  we  observed  that  a  node  could  be  completely  surrounded 
by  ,  and  die  code  of  die  node  adjusted  to  preserve  the  correctness  of  the 

program.  In  die  present  context,  this  observation  must  be  made  in  terms  of 
permutations.  Rather  than  merely  "fire''  a  node,  we  are  able  to  r-fire  a  node,  where 
r  is  a  permutation  (indices  denote  variables,  not  occurrences): 


V,j  ...  Vr» 


In  words,  suppose  we  follow  the  permutation  r  with  the  permutation  r.  On  entering 
the  node,  the  variable  that  used  to  be  in  position  1  will  now  be  found  in  position  r(Q; 
changing  die  names  in  the  node  will  cause  die  program  to  behave  in  die  same  way  as 
before.  To  insure  correctness  after  leaving  the  node,  the  effect  of  r  must  be  undone, 
Le*  we  must  apply  r~l,  followed  by  ••  The  effect  of  going  through  both  area  h 
(#*t-1)«(t «t)  ■  #•*,  just  as  before.  In  die  case  of  multiple  input  arcs,  we  must 
treat  each  one  as  above;  similarly  for  output  area.  The  notion  of  r -firing  leads  to  the 
basic  definition  of  this  section. 

OeflaltlM  A  ptrmutatkm-labeled-grapk  is 


a  directed  graph  6.  and  permutation  group  S. 


•  a  labeling  for  the  arcs  of  G  drawn  from  S0  and 

-  a  function  that  gives  the  cost  of  labeling  a  given  arc  with  a  given 
permutation 

The  minimal  equivalent  labeling  problem  is  to  choose  for  each  node  N  of  G  a 
permutation  that  the  set  of  firings  {r minimi  the  sum  of  the  coats  of 

the  resulting  labels  an  arcs  of  G. 

a 

In  order  to  any  program  at  aU  an  this  problem,  we  must  make  some  restrictions 
on  the  nature  of  the  cost  function.  These  are  based  on  the  following  group-theoretic 
notion. 

Definition  Given  re 5*  the  signature  of  e  is  obtained  by  expressing  r  as  the  product 
of  disjoint  cycles  and  farming  the  multiset  of  the  lengths  of  the  cycles. 

□ 

Thus,  the  signature  of  (1  3  2)  is  {3},  while  that  of  (1  4X2  3)  is  {2^}.  We  shall  make  a 
reasonable  cost  assumption  that  the  cost  of  a  permutation  on  any  given  arc  depends 
only  on  its  signature,  that  smaller  signatures  (in  the  sense  of  containment)  have  lower 
cost,  and  that  shorter  cycles  have  lower  cost  We  are  then  able  to  exploit 
group-theoretic  result 

Lamm a  9.4  The  signature  of  an  dement  of  S',  is  invariant  under  conjugation  and 
inversion.  In  particular,  signature^  «r)  *  ngnaturefr**)  for  any 
Praaf  For  conjugation,  see  [2],  Theorem  5.4.1;  the  result  for  inversion  is  dear  by 
inspection.  For  commutativity.  #-1»(r*r)*r  *  *•#. 

□ 

Suppose  we  are  given  a  graph  whose  arcs  are  labeled  by  permutations  represented  as 
products  of  disjoint  cycles,  thereby  giving  the  graph  a  certain  cost  We  ask  first 
whether  the  cost  can  be  reduced  by  picking  a  transposition  r  and  either  r-firing  a 
node,  or  leaving  it  unfired.  We  show  that  this  problem  reduoes  exactly  to  the 
undirected  marked  graph  problem  considered  previously,  regardless  of  how 
complicated  permutations  on  the  arcs  are. 

The  undirected  mark  graph  that  we  construct  has  exactly  the  same  graph  structure 
as  the  one  with  which  we  started,  except  that  the  arcs  are  considered  undirected. 
The  problem  is  chooring  the  costs.  Consider  an  arc  with  permutation  e.  If  we  r-fire 


one  end  of  the  ire,  we  get  the  permutation  r  •#,  while  if  we  r-fire  die  other  end  of 
the  arc,  we  get  Since  r  is  a  transposition  r~l  =  r,  and  by  T«nma  9.4, 

cost(r*#)  *  cost (#*r).  Thus,  if  we  Are  one  end  of  the  arc,  we  get  the  same 
difference  in  cost,  no  matter  which  end  we  fire.  Further,  if  we  fire  both  ends  of  the 
arc;  we  get  the  ptrmMftinw  whose  cost  is  unchanged  bom  that  of  •,  again 

by  9.4.  Thus,  in  the  undirected  marked  graph,  die  weight  of  this  arc  is 

|cost(#)  -  cost(vr)(.  If  cost  (#)  >  cost  (r»r),  we  mark  the  arc,  so  firing  one  end 
causes  a  reduction  in  cost;  otherwise,  we  leave  the  arc  unmarked,  so  a  single  firing 
causes  an  increase. 

It  is  natural  to  want  to  limit  the  number  of  r‘s  that  can  be  considered. 
Improvement  can  occur  only  if  cost  (#*r)  <  cost(r)  for  at  least  one  <*.  This  will 
hold  by  the  reasonable  cost  assumption  if  r  is  among  the  disjoint  cycles  of  o.  More 
generally,  it  holds  if  die  dements  of  r  are  adjacent  in  some  cycle  of  e.  This  is 
because  shorter  cycles  have  lower  cost  end: 

(1  2)(1  2«-.  *) 

If  •  does  not  involve  the  dements  of  r,  then  dearly  cost(r  «r)  >  cost(»).  because  the 
former  signature  is  larger.  The  equivocal  cases  arise  because  of  the  following  identity 
(assume  alt  _  bj  are  disjoint  from  a^^b^. 

(1  2*1  *i ...  *,)<2  e*  ~  -  (1  4  ...  tj  2  a*  b# 

By  considering  the  move  necessary  to  implement  the  permutations,  we 

would  expect  that 

cost((l  ax  _  *,  2  02  ~  bj))  <  cost((l  tfj  _  ~  hj)),  usually. 

The  only  place  dds  might  fail  is  if  the  "»«**»*"*  has  an  exchange  instruction,  in  which 
case  we  might  have,  for  example, 

Cost((l  3X2  4))  <  cost((l  3  2  4)) 

In  any  cam,  once  die  structore  of  the  machine  is  known,  it  is  pmrible  to  eliminate  a 
large  number  of  potential  rt  at  die  outset  We  ako  observe  that  several  r*»  may  give 
rise  to  the  same  undirected  marked  graph  problem,  for  example,  all  those  appearing 
as  potential  coat  reducers  at  only  one  arc. 

As  a  first  levd  heuristic  to  minimizing  the  cost,  we  would  propose  minimization 
with  respect  to  all  transpositions  r  aa  described  above.  Although  dds  technique  is 
probably  powerful  enough  to  handle  all  the  probSems  that  don't  arise  in  practice 


anyway,  there  are  some  further  observations  that  are  simply  too  intriguing  to  be 
omitted  from  this  discussion.  These  are  motivated  by  the  following  worry.  Suppose 
diet  the  algorithm  for  ordering  mxcc-sets  labels  only  one  arc  with  a  non-trivial 
permutation,  but  just  happens  to  put  it  in  the  wrong  place.  Can't  we  use  some  simple 
technique,  like  the  network  flow  analysis  of  the  previous  section,  to  find  the  right 
place? 

Our  spprouch  is  based  on  the  following  result  It  shows  that  while  we  cannot 
consider  arcs  to  be  undirected  in  the  general  case,  we  have  great  freedom  in  reversing 
their  direction. 


Leautsa  Ji  Given  a  permutation-labeled  graph,  reverse  one  of  its  arcs,  and  replace 
the  label  on  the  arc  by  the  inverse  of  the  original  label.  The  minimal  equivalent 
labeling  problem  has  the  same  solution  for  each  graph. 

Proof  Look  at  die  reversed  arc.  In  die  original  graph,  let  v  be  the  label  on  the  arc 
from  N\to  Hj.  In  die  modified  graph.  will  be  a  label  on  dm  arc  from  N2  to  JVj. 
Now.  r-ffre  Nx  in  each  graph.  In  the  first,  the  label  becomes  while  in  the 

second,  die  label  becomes  Since  them  labels  are  inverses,  their  costs  are  equal 

by  Lemma  9.4.  Similar  remarks  apply  to  N2. 

□ 

The  interpretation  of  reversing  an  arc  in  the  flowgruph.  or  even  the  mace-graph 
derived  from  it,  is  difficult  to  contemplate. 


This  lemma  suggests  that  for  r  not  necessarily  a  transposition,  we  may  reduce  the 
problem  of  finding  a  subset  of  r -firings  that  ndnindm  the  cost  not  to  an  undirected 
marked  graph,  but  to  a  heuristic  that  strongly  resembles  die  one  used  for  undirected 
marked  graphs  (Algorithm  9.6).  We  replace  each  directed  labeled  arc  of  the  original 
graph  with  a  two-cycle  of  directed  arcs,  each  weighted  with  the  extra  expense  of 
r-firing  die  node  at  entry  to  the  arc.  A  negative  extra  expense  is  interpreted  as  a 
gain. 

original  transfonned  graph 


cost(v«r)  -  cost(#) 


cost(e*T"1)  -  cost(w) 


The  idea  of  the 
the  axes  in  the  ti 


may  be  summarised  as  follows.  Let  if  be  the  set  of  all  of 
i  graph  that  have  negative  weight,  and  let  Mq  be  any  subset 


Algorithm  9.7  Try  for  reduced  labeling  with  Mq 

Label  at  a  source  each  node  JVsuch  that 
AT  is  an  entry  node  for  some  arc  in  Mq.  and 
Nb  not  an  exit  node  for  any  arc  in 
Label  rinks  by  the  deal  rule. 

Remove  all  arcs  In  M  from  the  transformed  graph 
Find  a  ndn  cat 

Let  F  be  the  set  of  nodes  that  aF  is  the  mm-cat  and  such  that  F  contains  die 
aources.  Then  r -firing  the  dements  of  F  will  reduce  die  cost  of  the  labeling  by  at 
least 

costfjtf)  -  cost  (aF) 

Any  extra  redaction  conies  because  an  arc  with  negative  weight  was  in  the  cut,  but 
not  in  M.  The  network  flow  algorithm  is  able  to  take  into  account  die  different 
^  involved  in  firing  different  ends  of  an  arc,  given  an  initial  choice  of  orientation. 
In  Algorithm  9.6,  the  different  orientations  were  tried  as  part  of  the  heuristic.  Here, 
— mUnmnt  of  if  brings  Hs  orientation  with  it  If  If  has  too  many  dements  to  try 
all  die  subsets,  we  can  sue  the  following  greedy  heuristic. 

Algorithm  M  Approximate  minimal  r-firings 
Sort  if  in  order  of  decreasing  {weighlj 

fsri<-  each  dement  of  if 
newAboifcutf 
If  ^  contains  arc  opporite  to  A 
remove  that  arc  from  newi^ 

Try  for  reduced  labeling  with  new Mq 
tms  reduces  me  case 
Mo  +  nemMo 

Conrider  the  effect  of  this  algorithm  when  r  is  a  transposition.  Then  the  pairs  of 
opposite  arcs  will  have  the  seme  weight  If  such  arcs  are  nest  to  each  other  after  M 
is  sorted,  the  above  algorithm  is  exactly  the  same  as  Algorithm  9.6.  As  before,  other 
heuristics  aright  be  proposed;  we  end  our  dbemrion  here. 

9l4  Coget  Labeled  Grspfag 

There  is  one  final  topic  yet  to  be  tHscenarl.  We  have  cHscnsaed  only  the  case 
H»  m  Hi,  where  permutations  are  the  natural  labels  on  the  arcs.  When  traversing  e 
boundary  arc  from  Hi  to  H?  where  n  >  m,  a  permutation  is  no  longer  the  natural 


label.  For  from  Hi  to  Hi,  we  might  have  (hollow  nodes  are  unused 

variables): 

•  •  • 

y 

•  00 

Even  when  traversing  to  a  yet  unseen  node,  it  is  not  ({mite  right  to  view  this  as  an 
identity  permutation  on  three  elements.  because  what  happens  to  2  and  3  is 
unimportant  We  could  try  to  model  this  with  scene  sort  of  subset  selection,  but  it  is 
technically  more  convenient  to  handle  it  group-theoretically,  using  the  idea  of  cosets. 
We  will  translate  "what  happens  to  2  and  3  is  unimportant"  to  "the  coset  consisting 
of  the  identity  followed  by  (multiplied  on  the  left  by)  Sfr  i)".  where  ^  denotes  the 
subgroup  of  Sm  that  has  all  the  permutations  of  2  and  3.  Let  t  be  the  identity 
element  of  Sr  The  coset  is: 

t  { »•»  |  } 

In  this  case,  as  in  all  cases  when  propagating  to  an  unseen  node,  the  coset  is  actually 
a  subgroup  of  because  dm  permutation  generating  it  may  be  taken  to  be  the 
identity. 

To  say  all  this  in  proper  generality,  we  must  review  the  notion  of  type  that  we 
introduced  for  the  *  *  2  case  (see  Algorithm  9.3).  There  we  labeled  nodes  of  h\  as 
type  1  or  type  2.  In  general,  we  mast  label  nodes  in  H?  with  a  subset  of  integers. 
For  use  in  coset  construction,  it  is  convenient  from  the  subset  to  specify  the  variables 
that  don't  appear,  Una  the  type  of  a  node  in  H?  will  be  a  subset  of  *-m  elements. 
In  dm  two  variable  case,  type  1  we  now  view  as  type  (2).  As  a  technical 
convenience,  we  can  view  the  type  of  a  node  of  Hi  as  d;  with  this  convention,  dm 
cosets  an  Hi  will  consist  of  single  elements  and  this  section  reduces  to  the  previous 
section.  We  will  continue  to  write  a  permutation  on  the  arc,  and  construct  the  coset 


implicitly. 

V 

9 

0 

1 

R 

Stf9»SiQ 

Si^StQ 

(implicitly) 

(implicitly) 

A  graph  labeled  and  interpreted  like  the  above  is  a  com  labeled  graph.  To  complete 
dm  definition,  we  most  review  r-firfng  a  node  and  how  to  obtain  a  cost  from  a 


permutation. 


Suppose  we  r-fire  the  node  labeled  /j  above.  First,  realize  that  this  changes  die 
type  of  die  node  from  to  r(/1)  4  {  ri  \  /€  /j  }.  This  is  clear  from  the  original 
diagram  for  r -firing  on  page  108.  Once  this  is  done,  we  may  replace  9  by  r  or-1  and 
9  by  r»9,  just  as  before.  To  see  why,  compare  the  products  bom  going  through 
both  arcs. 

(V#,V’(VT‘V  versus  (st2 . #  . r"1 .  (Sr0l) or.9»S^ 

To  show  identity  we  need  show  only  that  5^  =  which  is  immediate.  In 

swmnaiy,  to  r-fire  a  node,  we  nse  the  same  rule  for  coset  representative  as  we  had 
before,  and  permute  the  type  by  r. 

In  27?,  we  were  able  to  relate  the  cost  of  a  permutation  to  its  canonical 
representation  as  a  product  of  disjoint  cycles.  This  was  a  realistic  reflection  of  the 
implementation  of  a  permutation,  and  was  convenient  group-theoretically.  When  the 
types  are  non-null,  we  must  pick  a  canonical  form  for  cosets  S^***^,  and  make 
sure  tins  canonical  form  reflects  the  implementation.  What  freedom  is  there  in 
picking  another  #7  The  moat  general  situation  comes  from  looking  at  two  (disjoint) 
cycles  of  <r,  each  having  distinct  dements  /j  €  /j  and  ^e/2.  Write  these  cycles 
beginning  with  die  dement  of  t2: 

h  a\  ~  Xfe  e2  —  *h.  h  ei  —  ^i) 

Composing  an  die  left  by  (/2  and  on  the  right  by  ^>€5^  gives  another 

permutation  a  with  equal  to  the  original  coset  This  operation  leaves  die 

other  cycles  of  #  unchanged,  and  has  the  following  effect  on  the  above  cycles: 

(h.  *2  —  h.  h  ci  —  °2  —  *h.  h  a\  —  ^i) 

A  variant  of  this  occurs  if  we  have  a  common  element  j  in  /2n/i,  as  if  J\  ~  h  and 
d\  «.  Cj  is  mill: 

Kh  Mh  ~  h  a\  —  )(J  e  —  dXfj  j)  *  (^2  ^2  —  h  c  —  d)(/  Oj ...  kj) 

The  pattern  to  notioe  here  is  that  what  is  on  the  way  from  to  /j  or  £  to  jx  is 
unchanged;  what  is  on  the  way  from  to  i2  or  jx  to  ^  or  j  to  jit  interchangeable. 
There  are  several  ways  to  turn  this  observation  into  a  canonical  form.  We  choose  to 
record  the  r2-to-rl  information  in  the  syntax  ((*  -  4*  /2|>  and  the  rt-to-r2 

information  using  jaj  ...  bj).  Call  these  left  and  right  hdf-cycla.  The  following 
result  is  the  essence  of  why  half-cycles  give  rise  to  a  unique  representation  of  cosets. 


Launau  9.6  Given  /1#  i2  £  {1,  ~,  »},  and  #,  peS',;  suppose  *  S^***^. 

Let  /j{/|.  If  «■(/)  ^  /2.  then  p(/)  =  r(/);  otherwise  p(/)e/2  also.  Similarly,  let  t2.  If 
#j,  then  a_1(0  *  e_1(0;  otherwise  P~l(0€  also. 

Proof  Suppose  and  Any  dement  of  takes  /  to  *(/).  Hence, 

so  mart  every  element  of  S^p^S^.  Since  elements  of  do  not  affect  i,  every 
dement  of  S^p  most  take  /  to  *(/)•  Assume  p(f)et2.  Then  any  dement  of  5/2  will 
take  #(/)  to  another  element  in  /2.  Le*  not  to  *(/).  contradiction.  Thus  we  must  have 
pV)4*2'  *°d  *(/)  *»  unaffected  by  any  dement  of  S^.  Hence,  #(/>  *  #(/).  If  w(J)€£. 
reverse  the  roles  of  •  and  p,  and  argue  by  contradiction.  The  second  result  has  the 
same  proof. 

□ 

We  shall  define  a  (^.^-representative  for  by  looking  at  the  cycles  of  e. 

It  is  convenient  to  view  one-cycles  as  among  the  cycles  of  r,  that  is,  if  e(i)  =  /.  then 
(/>  is  a  cycle  of  w.  The  (/j./^representative  will  consist  of  cycles  and  half-cycles.  At 
each  stage  in  the  construction,  we  shall  show  that  the  representation  is  independent 
of  die  choice  of  r.  As  a  first  step,  it  is  convenient  to  adjust  •  so  that  its  cycles  do 
not  have  repetitions  of  dements  in  /|  or  t2. 

Lmmm  9.7  Given  /1#  m  aa  above.  We  can  find  p  such  that  = 

S^p^S^  and  such  that  every  cycle  of  p  has  no  mote  than  one  dement  of  tx  and  no 
more  than  one  dement  of  t2. 

Prwef  Indnctivdy,  it  is  sufficient  to  conddcr  one  cycle  at  a  time;  If  a  cycle  has,  say, 
dementi  1  and  2,  both  in  /j,  compose  on  the  left  by  (1  2 )«S^.  This  leaves  the 
coKts  unchanged,  and  cuts  die  cycle: 

(12X1  ui~hi2u2~hl)-(l  *i  ~  ~  h*) 

Composing  an  the  right  has  a  rimilar  effect 

□ 

Henceforth,  we  will  asume  that  representatives  for  a  coset  meet  the  conditions  for  p 
in  this  lemma.  A  further  simplifying  effect  is  provided  by  the  following  result 

Lana  9.J  Let  t^-h  •*>&  sappoae  r(fj)  ■ Ij.  Then 

***#*S'l  “ 

There  is  a  symmetric  result  for  /j€ 

Prssf  The  result  follows  from: 

|  w(A)  -  k } 


inductively),  the  definition  itself  does  not  depend  upon  the  choice  of 

More  briefly,  we  consider  a  cycle  r  that  has  no  element  of  t2.  The  construction  is 
like  the  above,  except  that  we  use  e~l  to  determine  the  rest  of  the  cycle  y.  and  in 
die  proof  we  bring  y  out  to  the  left,  not  the  right,  and  worry  about  /j— rather  than 

farM* 

We  are  reduced  to  die  case  in  which  every  cycle  of  •  has  exactly  one  element  of 
tx  and  one  element  of  t2.  One  possibility  is  that  the  elements  are  distinct  /|€/j, 
i2  €  t2  and  ix*i2.  Write  die  cycle  with  i2  first,  and  form  die  left  and  right  half-cycles: 

(/2  a ...  b  ix  c ...  J)  >-*  (i2  a  -.  b  /j|  and  |c  d) 

If  ”c  ...  stands  for  no  elements,  we  form  the  empty  right  half-cycle  | ).  If  "a ...  IT 
stands  for  no  dements,  we  naturally  have  (/2  /J.  The  only  other  possibility  is  that 
the  cycle  has  an  element  /€  txr\t2.  We  again  get  two  half-cycles: 

(/  c  d)  *•*  ( |  and  je ...  d) 

Again,  the  right  half-cycle  may  be  empty. 

Why  is  this  construction  canonical?  First,  by  the  reduction  stemming  from 
9.8,  every  element  of  £-/ j  and  tx-i2  will  appear  in  some  cycle.  By 
T+mmm  9.6,  we  may  start  at  t%-{x  and  iterate  •  until  we  encounter  an  element 
/j  €  /j-^  (other  possibilities  have  been  riiminated).  Both  ix  and  everything  on  the 
way  from  i2  to  ix  are  determined  independently  of  the  choice  of  thus,  the  set  of 
non-empty  left  half-cycles  is  independent  of  *.  But  aery  element  of  t2C\tx  will  also 
appear  in  a  cycle  of  r,  so  the  number  of  empty  left  half-cycles  is  also  fixed.  This 
proves  that  the  determination  of  left  half-cycles  is  canonical. 

Suppose  that  there  is  some  dement  not  in  /j  U  r2,  and  not  on  die  way  from  i2  to  ix 
in  some  cycle.  If  we  iterate  e  on  such  an  dement,  we  eventually  get  to  an  element 
of  if  we  iterate  er\  we  get  to  an  dement  of  tx  (maybe  the  same  dement).  Such  a 
segment  of  elements,  not  including  endpoints,  is  independent  of  choice  of  r;  these 
must  appear  in  die  non-empty  right  half-cycles  |c  ~  d).  But  the  number  of  right 
half-cycles  equals  the  nM«whw  of  left  half -cycles,  so  the  remaining  right  half -cycles 
must  be  empty,  and  their  number  is  independent  of  c.  This  completes  the  proof  that 
the  set  of  half -cycles  is  unique,  and  provides  the  basis  of  the  proof  by  induction  of 
the  following: 


Theereai  9.7  There  exists  a  unique  ( /j ,  /^-representative  for  every  coset  of  the  form 
and  it  may  be  (easily)  computed,  given 


We  now  relate  (^./^representation  to  the  code  required  to  implement  it, 
beginning  with  left  half-cycles.  The  arrows  in  the  picture  below  indicate  the  move 
instructions  that  are  required: 


The  code  sequence  required  is  (subscripts  name  variables,  not  occurrences): 

HOVE  V*.  V* 

•  •  • 

hove  v;, 

Because  V,  is  not  used  initially  (this  was  the  definition  of  fj)  implementing  a  left 
half-cycle  is  simpler  than  implementing  a  cycle — no  exchanges  are  necessary,  and 
there  is  no  need  to  use  a  temporary. 

The  implementation  of  a  right  half-cycle  is  similarly  easy: 

l  el  $ 


Recall  that  the  last  element  of  a  right  half-cycle  goes  to  "some  element”  of  /2.  But 
by  fWinrrinn  ij  are  the  set  of  elements  that  we  don't  care  about,  so  no  move  is 
actually  necessary.  Similarly,  the  first  element  of  a  right  half-cycle  comes  from 
"some  element”  of  /lf  again,  a  variable  that  we  do  not  care  about  The  only  moves 
that  are  neceasary  are: 

MOVE 
•  •  • 

HOVE  v;,vei 

In  particular,  if  a  right  half-cycle  has  only  one  element  no  moves  are  necessary. 


From  the  dtwwrirm  of  the  implementation,  it  is  clear  that  the  cost  of  a  coset 
depends  upon  the  structure  of  its  (/j ,  /^representative.  We  extend  the  notion  of 
signature 

Deflnltl—  Given  /j,  t2,  a  as  usual,  the  signature  of  is  obtained  from  its 

( />  ./^representative  by  forming  a  triple  of  multi-sets,  one  each  for  the  lengths  of 
cycles,  left  half-cycles  and  right  half-cycles. 

□ 

In  order  to  be  able  to  use  the  ♦feh*wpM)»  that  we  developed  for  permutation-labeled 
graphs,  we  want  to  have  an  invariance  of  cost  under  "conjugation” — firing  both  ends 
of  an  arc  by  r — and  inverse. 

Theorem  9.8  The  signature  of  a  coset  is  invariant  under  conjugation  and  inverse. 
Free!  Let  the  initial  coset  on  an  arc  be  If  we  r-fire  both  ends  of  the  arc, 

we  get  tiie  coset 

Express  a  as  a  product  of  cycles.  The  effect  of  is  to  replace  each  element  / 

in  tiie.  representation  by  ri  (this  is  the  essence  of  the  proof  of  Lemma  9.4). 
i»«irt.iyrtnn  of  tiie  (/^/^representation  for  a  proceeded  by  looking  only  at  /t,  t2 
the  cycles  of  c.  If  these  elements  ate  permuted  by  r,  the  final  result  will 
dements  permuted  by  r. 

To  prove  the  result  for  inverse,  first  note: 

<V**  vl "  V1""'’ V  * 

The  cycles  and  half-cycles  in  a  (/j, ^-representative  are  simply  reversed  to  obtain 
those  for  a  (/^^representative  of  the  inverse: 

(a  b)  (i ...  a) 

(ij  a  ~.  b  /jj  (/j  b  — ,  a  /J 

|c-.d)~|rf-.c) 

A  detailed  proof  that  this  works  is  omitted. 

□ 

The  techniques  that  we  developed  for  permutation-labeled  graphs  depended  on 
very  tittle: 


ft*? 


References 

1.  Ford,  LJL,  and  Fulkerson,  DJL  Flows  in  Networks.  Princeton  University  Prear, 
Princeton,  NJ,  1962. 

2.  Hall.  **«—*■»  jr.  The  Theory  of  Groups.  Macmillan,  New  York,  19S9. 

3.  Lewis.  H.  and  Papadindtrion.  C.  The  Efficiency  of  Algorithms.  Set  Am.  238 
(January  1978),  96  ff. 

4.  Matula.  D.W.,  and  Beck  LX.  Smallest-Last  Ordering  and  Clustering  and  Graph 
Coloring  Algorithm.  /.  ACM  30  (1983),  417427. 

5.  Tarjan,  Robert  E.  Depth  first  search  and  linear  graph  algorithms.  SIAM 
J.  Computing  I  (1972),  146-160. 

6.  Tarjan,  Robert  E.  Solring  Path  Problems  Chi  Directed  Graphs.  STAN-CS-75-528. 
Stanford  University,  November,  1975. 

7.  Tarjan,  LE.  Efficiency  of  a  good  but  not  linear  set  union  algorithm.  J.  ACM  22 
(1975).  215-225. 

8.  Waif,  William,  et  aL  The  Design  of  m  Optimising  Compiler.  Elserier,  New  York. 


above  22 
algorithm 

Approximate  wrfirfwMi  equivalent  marking  106 
atc-compilatian  35 


Completed  node  set  63 

Conduct  a  modification  subgraph  91 

ertabtfch  a  cohabitation  arc  40 

Grow  a  mxcc-set  67 

Initialize  eventaally-separate  relation  8! 

Mark  exchange  arcs  97 

Mark  move  arcs  99 

Match  occurrences  38, 46.  48 

Optimally  remove  twists  101 

Order  truce-sets  97 

Propagate  eventually  separate  relation  I 

Propagate  type  99 

propagation  of  an  occurrence  36,  37 

two  variable  modification  subgraph  73 


boundary  105 
boundary  node  8 
boundary  set  9,  29 
brash  set  37.54 


ccr  63 
CHB  17 
CHB  31 

dan  conflict  44.51 
dam  graph  45 
dam  matching  45 


»  '  «  (  »  1 


cohabit  8 
cohabitation  13 
cohabitation  dam  14 
cohabitation  classes  8 
cohabitation  graph  14 
color  79 

common  ms-partition  64 
common  refinement  61 


ivwnpMHrtn 

arc  35 

inter-region  7,  36.  47 
intra-region  7,  44 
node  6.28 
complete  split  86 
CON  33,37 


const  labeled  graph  113 
coat  101,106 
cat  105 


demand  set  9, 25, 29 
dominates  21 


entry  node  8. 29 
erentaally-aeparate  81 
exit  node  8. 29 

fire  100 


formation  time  50 
frearoencv  7 


H*  63,65 
IHree  path  62 
fHree  subgraph  62 
half-cycles  114 
history  tree  49 


interior  msec  92 

intermediate  form  3,  28 

intermediate  subgraph  2 

* - > - 

unrafum 


CON  33,37 
M  20 
S  20 


Jeflo  3 


last  brash  time  52,  58 
latent  twist  87 
live  8.62 


marking  100 
match  inconsistency  46 
matched  35, 45 
i— cohabitation  class  67 
mergesplit  partition  62 

equivalent  marking  101 
modification  subgraph  73,  84 
ms-partition  62 
msec  67 
msec-graph  70 


occurrence  8,  9. 28 


reasonable  cost  assumption  109 
refinement  61.63 
region  6, 28 
loot  nmoc  91 


» 


1 


An  kitermediate  Form  for  Bi-directional  Scanning  of  Programs 


1.  Introduction 

The  purpose  of  this  document  is  to  discuss  a  data  structure  that  aids  in  the  bi¬ 
directional  scanning  of  programs.  Our  primary  motivation  for  bi-directional 
scanning  is  in  the  live-dead  analysis  that  aids  in  the  register  allocation  aspects  of 
code-generation;  it  is  also  useful  in  some  aspects  of  program  verification,  and 
proofs  of  program  termination. 

The  data  structure  that  we  use  may  be  based  on  any  representation  for  a 
directed  graph  that  allows  arcs  to  be  traversed  in  either  direction.  We  assume  that 
the  reader  is  acquainted  with  the  basic  operations  on  directed  graphs.  The  source 
language  that  we  use  here  will  be  based  on  tree  like  representations  of  functions, 
not  unlike  those  found  in  LISP  or  ELI.  The  generality  required  to  handle  these 
languages  is  more  than  adequate  to  handle  languages  such  as  FORTRAN  or 
PASCAL. 

2.  Getting  started 

We  shall  take  as  our  source  language  a  language  based  on  the  lambda  calculus, 
but  with  declarations  added.  This  gives  us  a  very  simple  syntax.  A  term  is  one  of 
the  following: 


constant 

variable 

form  (termi, ...» term)  (application) 

Axj,...,  xn  .  term  (abstraction) 

6xj:  term,...,  xn:term  .  term  (declaration) 

The  terminal  classes  constant  and  variable  will  be  left  undefined  (xj  is  a  variable). 
The  only  other  terminals  are  A  and  <S;  these  are  neither  variables  nor  constants. 


The  first  step  in  constructing  a  flow  graph  will  be  a  naive  translation  from  the 
above  syntactic  classes  to  straight-line  code".  This  is  accomplished  by  a  function 
denoted  by  T,  and  having  two  arguments.  The  first  is  the  term  to  be  translated, 
and  the  second  is  the  node  at  which  scanning  is  to  start.  This  node  will  have  no 
contents  at  entry  to  T,  but  will  be  incorporated  in  existing  graph  structure.  The 
result  of  T  is  the  node  of  which  scanning  is  to  resume. 

For  variables  and  constants,  T  installs  its  first  argument  as  the  contents  of  its 
second  argument,  and  connects  the  second  argument  to  a  new  node,  which  then 
becomes  the  result  of  T. 

t  (c,  - KZ)  t(x’ 

the  2nd  argument  of  T  the  result  of  T  similarly 

Implicit  in  these  constructions  is  that  the  first  node  pushes  the  value  of  c  or  the 
present  value  of  x  onto  a  stack,  and  that  control  flows  on  to  the  next  node  in  the 
scan. 

To  produce  the  graph  for  an  application,  the  graphs  for  the  constitutents  are 
chained.  The  following  a  graph  expresses  the  semantics  that  applications  are 
evaluated  by  evaluating  the  operator  and  operands  from  left  to  right,  followed  by 
the  function  call: 

T  (termo(termi,...,  term  n»:  the  result  of  T 


In  words,  the  function  and  each  of  its  arguments  is  pushed  onto  the  stack,  followed 
by  a  "Call".  The  pseudo-op  CALL  has  two  operands.  The  first  is  n,  the  number  of 
arguments  to  the  function.  The  second  is  a  pointer  to  a  new  node  that  becomes  the 
result  of  T.  There  is  not  an  ordinary  one  from  the  CALL  node  to  the  result  of  T 
because  ordinary  arcs  indicate  direct  flow.  However,  the  new  node  is  the 
appropriate  point  to  resume  the  scan. 


3 


We  shall  later  need  a  picture  of  the  stack  at  entry  to  an  abstraction.  For 
purposes  of  specificity,  we  shall  assume  that  CALL  replaces  the  function  that  is  n 
entries  below  the  top  of  the  stack  with  the  return  node,  i.e.,  the  second  operand, 
just  before  entering  the  abstractions. 

As  an  example  of  the  graph  structure  set  up  by  T,  consider  the  term  f(g(x,y),z). 
Its  translation  is: 

(fXgK^HyXCALL  2,^— (z)-*(CALL  2,) - *Q 

Not  unlike  reverse  polish  notation. 

Tlie  graph  of  an  abstraction  involves  a  slight  twist.  On  the  one  hand,  the  graph 
of  an  abstraction  must  involve  the  graph  of  its  body  (the  "term"  in  xj,  ..xn  . 
term);  on  the  other,  in  this  language,  an  abstraction  is  a  value.  To  denote  the  value 
aspect  of  abstractions,  we  will  use  an  index  number  i,  and  treat  the  abstraction 
essentially  as  a  constant. 

T  (  xi...xn.  term)  = 

The  abstraction  also  has  a  program  aspect,  for  which  we  want  a  graph.  We  let  G  i 
be  the  flowgraph  for  abstraction  i.  This  graph  consists  mainly  of  a  graph  for  term, 
with  a  header  and  trailer  corresponding  to  function  entry  and  exit. 


The  pseudo-op  ENTRY  gives  the  names  xq,  xj,  ....xn  to  the  top  n+1  entries  on  the 
stack  (xq  names  the  return  address  preceding  the  first  argument;  how  that  name  is 
chosen  is  not  discussed  here).  The  pseudo-op  EXIT  indicates  that  the  2nd  through 
n+2nd  entries  (return  address  and  n  arguments)  are  to  be  removed  from  the  stack, 
leaving  the  top  entry  as  the  result  of  the  call.  EXITS  operand  is  a  back  pointer  to 
the  entry  node  of  the  abstraction,  a  useful  piece  of  information,  as  we  shall  see. 

The  graph  of  a  declaration  involves  the  graphs  of  termi,  ...termn  (which 


G[i]  =  (ENTRY 


The  pseudo-op  SCOPE  indicates  entry  to  a  scope  in  which  mutually  recursive 
functions  can  be  declared,  DECL  indicated  the  point  at  which  the  names  take 
effect.  The  pseudo-op  UNDECL  has  single  operand  indicating  the  DECL  that  is  to 
be  undone.  The  2nd  through  n+lst  entries  are  removed,  leaving  the  result  of  termo 
as  the  result  of  the  declaration. 

3.  Weak  Interpretation 

Tlie  purpose  of  this  section  is  to  describe  how  to  splice  together  the  various 
graph  fragments  that  have  been  constructed  so  that  flow  of  control  on  function 
calls  is  represented  by  arcs  in  the  graph.  Because  functions  are  values,  the  process 
of  constructing  this  full  graph  must  involve  essentially  a  "symbolic  execution"  or 
"weak  interpretation"  of  the  program.  The  technique  described  here  has  more  the 
flavor  of  weak  interpretation  than  of  symbolic  execution,  insofar  as  there  is  a  real 
difference. 

We  shall  provide  a  property  set  [1]  or  monotone  framework  [2]  for  modeling 
the  state  of  an  actual  execution.  Since  the  state  for  the  language  just  described  is 
entirely  contained  in  the  stack,  we  will  call  the  particular  property  set  that  is 
developed  below  a  stack  model,  or  simply  sm. 

Every  node  of  the  graph  will  have  an  associated  sm.  Each  sm  may  be 
relatively  large  object,  and  it  may  appear  that  the  amount  of  data  is  massive. 
However,  even  though  at  one  level  we  view  each  sm  as  distinct,  at  another  level  we 
provide  for  a  great  amount  of  sharing  between  sin's  at  adjacent  nodes  of  the 
flowgraph.  The  sm  field  at  each  node  is  of  constant  size  -  a  pointer  -  and  we 
expect  the  total  amount  of  space  consumed  by  sm's  to  grow  roughly  linearly  with 
the  size  of  the  program. 


5 


At  any  time  in  the  execution  of  a  program,  the  stack  is  a  linear  sequence  of 
values  (for  uniformity,  we  think  of  return  addresses  as  values),  which  we  may  view 
as  a  sequence  of  values  with  the  top  of  the  stack  at  the  beginning  of  the  sequence  . 
In  a  recursive  program,  the  stack  may  grow  arbitrarily  large,  and  so  a  stack  model 
cannot  directly  reflect  all  the  entries  in  all  possible  stacks.  Instead,  a  stack  model 
is  a  graph  with  a  distinguished  node,  called  the  top  node,  because  it  corresponds  to 
the  top  of  the  stack  (this  graph  is  not  to  be  confused  with  the  fbwgraph).  The  sm 
field  of  a  flowgraph  node  points  to  the  top  node.  The  sense  in  which  an  sm  models 
an  actual  stack  is  that  any  actual  stack  will  be  a  path  through  the  graph  that 
constitutes  the  sm.  But  the  sm  is  of  bounded  size  (as  we  shall  see),  and  will  thus 
have  cycles  in  a  recursive  program.  It  may  be  the  case  that  there  are  paths  through 
an  sm  that  do  not  correspond  to  possible  stacks  during  execution.  These  represent 
loss  of  information  by  the  modeling  process,  a  necessary  property  of  any  analysis 
program  that  always  terminates. 

Running  through  the  stack  is  the  static  chain,  which  allows  finding  the  stack 
locations  of  names  that  are  visible  at  that  point  in  the  execution.  To  model  the 
static  chain,  certain  nodes  of  an  sm  will  be  marked  as  static-chain  nodes.  These 
nodes  will  have  a  list  of  associated  names,  a  pointer  to  the  sm  node  corresponding 
to  the  last  of  these  names,  and  a  pointer  to  the  next  node  in  the  static  chain. 
Because  the  static  chain  is  bounded  in  an  execution,  the  model  of  the  static  chain 
reflects  the  actual  static  chain  precisely. 

In  order  to  splice  together  graph  fragments,  sm  must  keep  track  of  two  types 
of  value,  functions  and  return  addresses.  To  keep  track  of  functions,  we  attach  to 
each  sm  node  a  set  of  functions,  representing  the  set  of  all  functions  that  are 
possible  for  the  value  corresponding  to  the  sm  node.  There  are  two  types  of 
functions.  One  is  a  constant  function,  which  we  may  think  of  as  a  built-in.  The 


other  kind  of  function  is  a  pair  i,s  where  i  is  the  index  of  an  abstraction  and  s  is  the 
static  chain  in  which  that  abstraction  is  to  be  evaluated  (an  abstraction  alone  is  not 
a  function;  it  must  be  supplied  with  an  environment). 

To  keep  track  of  return  addresses,  we  do  not  label  the  sm  node  corresponding 
to  the  return  address;  rather,  we  label  the  sm  arcs  arriving  at  such  a  node,  each 
one  having  a  distinct  label.  As  we  shall  see,  this  provides  enough  information  to 
model  execution  reasonably  accurately. 

4.  Node  operations  on  stack  models 

In  this  section  we  will  describe  the  weak  interpretation  of  flowgraph  nodes. 
Each  flowgraph  node  has  an  associated  sm  describing  the  state  prior  to  execution 
of  that  node;  we  show  how  to  combine  the  prior  sm  and  the  contents  of  the  node  to 
yield  a  set  of  sm /flowgraph  node  pairs  that  describe  the  possible  states  after 
execution.  The  action  to  be  taken  with  this  set  will  be  described  in  the  next 
section. 

Weak  interpretation  begins  at  an  abstraction  designated  as  the  "main 
program".  Assume  that  this  abstraction  ha?  n  arguments.  As  we  shall  see  in  the 
next  section,  its  ENTRY  node  will  point  to  an  sm  that  is  a  simple  chain  of  n  +1 
nodes.  The  last  node  in  the  chain  corresponds  to  the  return  address.  The  fact  that 
this  sm  node  has  no  arcs  leaving  it  will  indicate  to  the  weak  interpreter  that  exit 
from  this  abstraction  corresponds  to  program  exit.  In  order  to  begin  weak 
interpretation  at  any  abstraction,  not  just  the  first,  it  is  necessary  to  have  not  only 
the  sm  corresponding  to  the  state  upon  entry,  but  also  a  model  of  the  static  chain, 
which  we  denote  by  s.  In  situations  where  there  are  no  symbols  defined  outside  the 
program,  s  may  be  nil;  if  there  are  symbols  that  must  be  known  during  weak 
interpretation,  these  can  be  represented  by  s.  To  process  an  ENTRY  node,  the 
weak  interpreter  "adjoins"  a  node  to  the  prior  sm.  This  means  that  a  new  node  is 


7 


obtained,  and  an  arc  is  established  from  the  new  node  to  the  prior  sm,  i.e.,  to  the 
top  node  of  that  sm.  (As  we  shall  see,  this  basic  operation  is  used  in  the  weak 
interpretation  of  several  node  types;  it  corresponds  to  a  "push"  on  to  the 
interpreter's  stack).  In  the  case  of  an  ENTRY  node  the  adjoined  node  becomes  a 
static  chain  node.  The  list  of  names  is  the  operand  to  ENTRY.  The  sm  node 
corresponding  to  the  last  of  these  names  is  the  top  node  of  the  prior  sm 
(corresponding  to  the  last  argument).  The  static  chain  link  for  this  static  chain 
node  is  of  course  s.  In  this  case,  as  in  later  ones,  the  adjoined  node  becomes  the 
top  node  of  the  new  sm.  Tlie  result  of  the  weak  interpretation  of  the  entry  node  is 
a  singleton  set  whose  pair  is  the  new  sm  just  described,  and  the  flowgraph  node 
pointed  to  by  the  (necessarily)  unique  are  out  of  the  ENTRY  node. 

The  weak  interpretation  of  a  node  containing  a  constant  also  involves  adjoining 
a  new  node  to  the  prior  sm.  It  is  necessary  to  attach  the  set  of  functions  that  this 
constant  will  evaluate  to.  If  the  constant  is  not  a  function,  this  set  is  null.  If  the 
constant  is  a  function,  then  the  set  is  a  singleton  consisting  of  the  constant. 

A  node  containing  a  variable  is  processed  in  the  same  way  as  a  constant  node, 
once  the  function  set  is  obtained.  To  obtain  the  function  set,  we  "look  up"  the 
variable  in  much  the  same  way  that  a  value  interpreter  would.  Specifically, 
beginning  at  the  node  pointed  to  be  the  sm  field,  follow  sm  arcs  until  a  static-chain 
node  is  seen.  If  the  desired  variable  is  not  among  those  at  this  static  level,  then 
follow  the  link  to  the  next  sm  node  in  the  static  chain,  and  repeat  the  above  step  at 
that  node.  If  the  static-chain  pointer  is  nil,  the  variable  is  undefined  and  the 
program  has  an  error.  If  the  desired  variable  is  in  the  set  of  names  at  a  node,  then 
we  can  find  its  offset  from  the  last  of  the  names,  and  we  also  know  the  sm  node  n 
corresponding  to  the  last  of  the  names.  Call  the  offset  p;  if  we  traverse  p  sm  arcs 
from  n,  we  arrive  at  a  node  corresponding  to  the  variable  being  looked  up.  The 


8 


only  sm  nodes  with  multiple  out  arcs  correspond  to  return  addresses,  and  these 
always  occur  as  the  first  element  in  a  list  of  names.  Thus  there  is  never  any  choice 
in  traversing  the  p  arcs,  and  the  name  ^jecifies  a  unique  sm  node. 

A  node  corresponding  to  an  abstraction  (pseudo  op  #)  is  also  treated  like  a 
constant  or  variable  node,  once  the  function  set  is  found.  In  this  case,  the  function 
set  is  a  singleton,  consisting  of  the  pair  i,s,  where  s  is  the  first  static-node  seen 
when  folowing  sm  arcs  from  the  node  pointed  to  be  the  sm  field. 

For  constant,  variable  and  abstraction  nodes,  the  result  of  weak  interpretation 
is  a  singleton  set,  whose  pair  is  the  described  sm,  and  the  node  obtained  by 
following  the  unique  flowgraph  are  out  of  the  node. 

The  processing  of  the  CALL  and  EXIT  pseudo-ops  provide  the  interprocedural 
linking  that  we  desire.  We  consider  CALL  first.  The  first  operand  (the  n)  and  the 
sm  field  attached  to  the  node  yield  the  function  set  that  is  possible  from  the  call. 
Hie  result  of  weak  interpretation  will  have  one  sm/node  pair  for  every  element  of 
the  function  set. 

We  first  examine  built -in  functions.  Each  such  function  has  built-in  semantics 
that  must  be  properly  represented  by  the  graph.  Many  built-in  functions  have  little 
interaction  with  flow  of  control.  For  instance,  none  of  the  arguments  of  +  are 
functions;  its  result  is  not  a  function  and  its  effect  is  merely  to  go  on  to  the  next 
step.  This  effect  can  be  represented  by  connecting  the  CALL  node  to  the  second 
operand  of  the  CALL  indicating  the  flow  of  control  that  actually  occurs.  The  sm 
for  such  a  function  is  obtained  by  traversing  n  in  arcs  from  the  node  jointed  to  by 
the  sm  field  of  the  call  node  (corresponding  to  popping  the  arguments  and  the 
function  value),  and  then  adjoining  at  this  sm  node  a  node  that  represents  the  result 
of  the  built-in  (corresponding  to  the  push).  The  pair  for  such  a  function  is  the  sm 
just  described,  and  the  second  operand  of  the  CALL. 


9 


There  are  some  built-in  functions  that  are  more  interesting:  those  that  affect 
flow  of  control,  and  assignment.  These  are  discussed  in  later  sections. 

To  process  a  pair  i,s  corresponding  to  an  abstraction,  we  imitate  in  the  sm  the 
action  that  would  be  taken  in  execution,  where  the  function  call  is  replaced  with 
the  return  address.  This  cannot  be  done  directly  in  the  sm  because  of  data  sharing. 
Instead,  it  is  necessary  adjoin  a  new  node  to  the  node  n+1  steps  from  the  top  of  the 
prior  sm.  This  node  corresponds  to  the  return  address  and  the  new  sm  arc  is 
labeled  with  the  return  node.  Then  a  copy  is  made  of  the  top  n  nodes  of  the  prior 
sm,  and  the  last  of  these  nodes  is  linked  to  the  return  address  node  that  was  just 
adjoined.  Recall  that  beginning  the  weak  interpretation  of  an  abstraction  requires 
not  only  sm,  but  also  the  static  chain.  In  this  case,  that  is  simply  the  second 
component  of  the  pair  i,s.  The  flow  of  control  from  the  CALL  node  node  to  the 
ENTRY  node  is  indicted  by  establishing  a  flowgraph  arc  from  one  to  the  other.  The 
contribution  that  an  abstraction  makes  to  the  result  of  weak  interpretation  is  the 
pair  consisting  of  the  described  sm  and  the  ENTRY  node  of  the  abstraction. 

The  sm  associated  with  node  containing  an  EXIT  pseudo-op  has  a  special  form. 
The  top  node  represents  the  result  of  the  application.  Traversing  one  sm  arc  ar¬ 
rives  at  a  static-chain  node,  which  tells  how  many  more  steps  mst  be  traversed  to 
arrive  at  the  node  corresponding  to  the  return  address.  The  result  of  the  weak 
interpretation  of  the  EXIT  node  will  have  one  pair  for  each  arc  leaving  the  return 
address  node,  where  the  node  component  for  the  pair  is  the  label  on  the  arc  (the 
return  node).  Copy  the  top  of  the  prior  sm  and  make  it  point  to  the  node  at  the 
other  end  of  the  labeled  sm  arc,  representing  the  removal  of  the  return  address  and 
arguments.  The  new  node  is  the  top  node  of  the  sm  associated  with  the  return 
(flowgraph)  node.  To  indicate  flow  of  control,  establish  an  arc  from  the  EXIT  node 


to  the  return  node. 


The  pseudo-ops  remaining  to  be  discussed  arc  all  associated  with  declaration: 
SCOPE,  DECL  and  UNDECL.  The  actions  of  these  pseudo-ops  must  be  coordinated, 
but  there  are  several  ways  of  achieving  the  desired  effect:  allowing  the  definition 
of  a  mutually  recursive  set  of  routines.  We  shall  present  one  method  here, 
corresponding  rather  transparently  to  a  reasonable  implementation.  To  preview: 
SCOPE  establishes  a  new  static  chain  link,  but  with  no  name;  DECL  fills  in  the 
names  and  the  stack  pointer;  and  UNDECL  removes  everything  except  the  top  of 
the  stack.  We  now  discuss  each  case  in  more  detail. 

To  weakly  interpret  SCOPE,  adjoin  a  new  node  to  the  prior  sm.  This  node  will 
be  a  static  chain  node,  whose  static  chain  link  is  obtained  by  traversing  the  prior 
sm  until  a  static  chain  link  is  found  (there  will  be  no  branching  along  this  sm  path). 
The  list  of  names  for  this  static  chain  node,  as  we  stated  above,  is  empty.  Since  no 
names  will  be  found  at  this  static  chain  link  under  these  conditions,  the  pointer  to 
the  sm  node  for  the  last  name  is  irrelevant,  and  may  as  well  be  nil.  Note  that  this 
allows  the  weak  interpretation  (and  evaluation)  of  termj,  ...»  termn  in  an 
environment  where  the  names  have  not  yet  been  installed. 

To  weakly  interpret  DECL,  traverse  n  sm  arcs  from  the  distinguished  node  of 
the  prior  sm  (there  will  be  no  branching  along  this  path),  arriving  at  the  static 
chain  node  that  was  established  by  SCOPE.  The  effect  of  DECL  is  merely  to 
install  into  the  name  field  here  the  list  of  names  of  the  DECL,  and  to  set  the 
previously  nil  pointer  to  the  present  distinguished  sm  node,  i.e.  the  top  of  stack. 
This  cannot  be  done  to  the  extant  sm  graph  structure,  because  that  would 
invalidate  the  stack  models  of  flowgraph  nodes  pointing  to  the  shared  structure.  It 
is  necessary,  at  least  conceptually,  to  copy  the  n+1  nodes  that  represent  the  top  of 
the  stack,  and  only  then  make  the  changes  to  the  static  chain  node.  The  first  node 
of  the  copied  chain  is  the  top  node  for  the  new  sm. 


The  weak  interpretation  of  UNDECL  is  what  one  would  expect:  the  top  sm 
node  is  copied,  but  attached  to  the  node  obtained  by  traversing  n+1  arcs  past  it, 
thereby  popping  not  only  the  n  variables,  but  also  restoring  the  static  chain  to  what 
it  was  before  the  declaration. 

In  the  processing  of  every  node  type,  observe  that  the  amount  of  new 
(unshared)  graph  structure  is  proportional  to  the  size  of  the  program.  In  fact,  for 
all  but  DECLs,  only  one  new  sm  node  is  required  at  each  point.  For  a  DECL,  the 
number  of  nodes  is  the  number  of  variables  plus  one,  but  variables  contribute  to 
program  length. 

S.  Disjunction 

Weak  interpretation  uses  a  set  Q  of  "Unprocessed  nodes".  Hiese  are  nodes  that 
must  be  processed  before  a  consistent  weak  interpretation  has  been  attained.  We 
have  already  noted  that  at  the  beginning  of  weak  interpretation  the  ENTRY  node 
of  the  main  program  is  given  an  initial  sm.  Hie  rest  of  initialization  consists  of 
ensuring  that  the  sm  fields  of  all  other  nodes  are  set  to  nil  (meaning  that  flow 
cannot  arrive  at  this  node),  and  initializing  Q  to  the  singleton  whose  element  is  the 
entry  node. 

Hie  general  outline  of  weak  interpretation  consists  of  removing  a  node  from 
Q,  and  processing  it  according  to  the  description  of  the  previous  section.  Hie 
result  is  a  set  of  pairs  (sm,  fn)  consisting  of  an  sm  that  is  valid  prior  to  the 
flowgraph  node  fn.  Hie  key  to  termination  of  weak  interpretation  is  whether  fn  is 
placed  back  in  Q,  and  if  so,  what  is  the  value  of  the  new  sm  field  of  fn.  Hie  theory 
of  weak  interpretation  says  that  the  key  to  obtaining  a  correct  and  terminating 
weak  interpretation  is  the  definition  of  a  suitable  disjunction  operation  on  stack 
models.  Given  such  a  disjunction  operation,  we  apply  it  to  sm  and  the  sm  already 


attached  to  fn.  If  the  result  is  the  same  as  the  old  sm  attached  to  fn,  fn  is  not 
placed  back  in  Q.  Otherwise,  fn  is  put  back  in  Q,  and  its  new  sm  is  the  result  of 
disjunction. 

We  now  consider  the  disjunction  of  smi  and  sm2*  If  smi  is  nil,  then  the  result 
is  sm2;  if  sm2  is  nil,  the  result  is  smi*  This  corresponds  to  the  observation  that  if 
the  sm  field  of  n  node  is  nil,  then  flow  does  not  arrive  there.  Ifte  logical  or  of  this 
condition  and  any  logically  weaker  condition  (flow  arrives  here  with  the  stack 
having  thus  and  such  a  shape)  is  the  logically  weaker  condition.  Since  a  nil  sm  is 
never  propagated  forward,  if  this  case  applies,  the  node  is  always  put  back  in  Q.  If 
weak  interpretation  terminates  and  the  sm  field  of  some  node  is  nil,  then  it  is 
indeed  true  that  flow  never  arrives  at  that  node. 

If  smi  and  sm2  one  both  non-nil,  we  have  a  more  interesting  question.  To  see 
how  to  define  disjunction,  recall  that  the  basic  definition  of  a  stack  model  is  that 
its  paths  limit  what  might  be  seen  as  the  values  of  a  stack.  Suppose  we  want  to 
construct  smi  whose  set  of  paths  is  as  small  as  possible,  but  still  contains  the  union 
of  the  paths  for  smi  an^  sm2*  Ignoring  efficiency  considerations  for  the  moment, 
let  smQ  be  the  graph  consisting  of  disjoint  copies  of  smi  ^  sm2-  We  will  describe 
a  process  called  pinching".  This  begins  at  the  top  nodes  of  smi  and  sm2*  To  pinch 
two  nodes,  coalesce  them  in  the  graph,  and  attach  as  a  function  set  the  union  of 
the  function  sets.  Then  examine  the  out  arcs  of  the  two  nodes.  If  any  of  the  out 
arcs  are  labeled,  then  all  must  be  labeled,  because  we  have  just  pinched  a  return 
address  node.  In  this  case,  we  pinch  nodes  at  the  ends  of  identically  labeled  arcs 
(which  come  in  pairs).  Otherwise,  there  will  be  at  most  a  pair  of  out  arcs 
(corresponding  to  the  single  out  arcs  in  smi  and  sm2).  If  there  is  a  pair,  pinch  the 
two  nodes. 

While  this  is  not  the  place  for  a  detailed  proof,  completion  of  pinching  results 
in  a  stack  model  having  the  desired  validity  property,  in  other  words,  the  union  of 


13 


the  set  of  paths  of  sm^  and  sm2  is  contained  in  the  set  of  smo>  and  this  is  the 
smallest  set  of  paths  describable  by  a  stack  model.  More  interesting  is  the  fact 
that  even  though  the  semilattice  described  by  this  disjunction  operation  is  not  well- 
founded,  weak  interpretation  still  terminates.  To  prove  this,  we  observe  that  an 
sm-node  for  the  return  address  of  a  particular  abstraction  can  appear  at  most  once 
in  any  sm  attached  to  a  node.  This  is  certainly  true  initially.  The  only  place  that  a 
return  address  sm  node  is  adjoined  is  at  a  CALL  node.  This  may  temporarily 
produce  an  sm  with  two  return  nodes  for  a  particular  abstraction,  but  in  the 
disjunction  that  necessarily  precedes  attachment  to  a  flowgraph,  the  two  nodes 
are  merged  (this  is  how  cycles  arise  in  the  sm  graph).  To  the  observation  that  the 
number  of  return  address  nodes  is  finite,  we  add  the  observation  that  the  out 
degree  of  these  nodes  is  bounded,  since  each  such  sm  arc  is  labeled  by  a  distinct 
flowgraph  node,  and  there  are  a  fixed  number  of  these.  The  paths  between  return 
nodes  cannot  grow  indefinitely,  and  thus  the  sm  graph  is  finite. 

6.  Assignment 

In  confronting  the  issue  of  assignment,  the  issues  of  parameter  passing, 
aliasing,  and  a  host  of  related  concerns  arise.  We  shall  provide  a  quite  simple  way 
of  viewing  assignment,  in  which  all  of  the  other  issues  can  be  expressed.  Simply 
put,  locations  will  be  treated  as  bona  fide  values,  and  locations  will  be  kept  track 
of  in  much  the  same  way  as  function  values.  In  what  we  have  described  thus  far, 
parameter  passing  has  been  by  value,  and  that  will  continue  to  be  our  model.  To  do 
parameter  passing  by  name,  simly  pass  locations  along;  aliasing  is  represented  by  a 
location  set. 

We  have  already  seen  that  an  abstraction  is  sometimes  treated  as  a  value  and 
sometimes  as  a  procedure.  Similarly  a  name  x  is  sometimes  treated  as  a  variable 
(as  we  have  done  already),  and  sometimes  needs  to  be  treated  as  a  location  (for 


purposes  of  assignment).  In  the  syntax  for  a  term,  we  have  discussed  the  terminal 
class  "variables'1;  symbols  in  this  class  are  treated  as  variables,  i.e.,  evaluating 
them  yields  their  value.  Without  raising  the  issue  of  surface  syntax,  it  is  possible 
for  constant  to  be  a  symbol.  In  this  case,  the  evaluation  of  the  constant  is  the 
location  on  the  stack  designated  by  the  symbol  at  the  time  of  evaluation. 

In  order  to  model  assignment,  we  must  extend  the  notion  of  a  stack  model,  in 
that  now,  we  associate  not  only  a  function  set  with  each  node,  but  also  a  location 
set.  (We  can  arrange  the  representation  so  that  the  representation  of  two  null  sets 
is  no  more  expensive  than  was  the  representation  of  the  single  null  set.  If  the 
language  is  strongly  typed,  we  might  also  take  advantage  of  the  fact  that  the  two 
sets  cannot  be  simultaneously  non-null.)  Hie  representation  of  a  location  is  a 
pointer  to  an  sm  node. 

It  is  necessary  to  describe  how  to  weakly  interpret  a  constant  node  that  is  a 
symbol.  Given  a  prior  sm,  the  symbol  is  "looked  up"  in  the  same  way  as  a  variable, 
but  the  pointer  to  the  sm  node  corresponding  to  the  symbol  is  returned,  not  the 
attached  function  set.  Hie  node  adjoined  to  the  prior  sm  is  given  a  singleton 
location  set,  consisting  of  the  sm  node  that  was  found  on  look  up.  Hie  attached 
function  set  is  of  course  null. 

In  the  weak  interpretation  of  a  variable  node,  looking  up  the  variable  now 
produces  both  a  function  set  and  a  location  set.  These  constitute  the  pair  that  are 
attached  to  the  adjoined  node. 

We  now  describe  the  weak  interpretation  of  the  constant  (Unction  ASSIGN.  Its 
first  argument  produces  a  location,  and  the  second,  a  value  that  will  be  stored  in 
the  location. 


First,  since  ASSIGN  does  not  affect  flow  of  control,  establish  a  flowgraph  arc  from 


the  CALL  node  to  its  successor.  To  obtain  an  sm  for  the  location,  we  obtain  a  new 
sm  for  each  element  of  the  location  set  of  the  second  entry  in  the  prior  sm.  These 
sm*s  are  disjoined  by  the  technique  of  the  previous  section,  and  the  result  is  the 
new  sm.  In  most  cases,  there  will  be  only  one  element  of  the  location  set,  so  there 
is  no  necessity  to  do  a  disjunction. 

The  problem  thus  reduces  to  describing  the  effect  on  sm  of  assigning  to  a 
single  location.  Since  assignment  of  location  values  and  function  values  is  rare  in 
typical  programs,  the  location  and  function  sets  of  both  the  location  being  assigned 
to,  and  the  top  of  the  prior  sm,  are  most  probably  null.  In  this  case,  the  assignment 
has  no  effect  on  data  of  interest  to  a  stack  model,  and  the  new  sm  is  obtained  by 
popping  three  nodes  off  the  prior  sm,  and  adjoining  a  node  that  represents  the 
result  of  assignment  (depending  on  ones  taste,  this  might  be  the  first  argument,  the 
second,  or  a  canonical  nothing  result). 

Suppose  that  the  assignment  is  of  a  function  or  location  value.  What  we  want 
to  do  is  simply  to  change  the  function  set  location  set  pair  to  be  the  pair  on  the  top 
of  the  prior  sm.  This  cannot  be  done  literally,  because  the  prior  sm  is  pointed  to 
from  the  sm  fields  of  other  nodes.  A  correct  algorithm  would  be  to  apply  the  prior 
sm  before  changing  a  field  in  it.  A  more  sophisticated  approach  is  to  copy  only  as 
much  as  necessary,  and  to  combine  the  necessary  copying  with  the  disjunction  that 
may  be  necessary.  These  are  details  beneath  the  level  of  the  current  discussion. 

Another  natural  operation  on  locations  is  DEREF,  which  we  take  to  be  a 
constant  function.  This  function  has  a  single  operand  which  must  be  a  location;  its 
result  is  the  current  value  of  that  location.  The  weak  interpretation  of  DEREF  is 
essentially  like  that  of  +;  the  only  difference  is  in  how  the  function  and  location 
sets  are  computed.  From  each  location  in  the  location  set,  obtain  the  function  and 
location  sets  attached  to  that  node;  the  union  of  all  of  those  is  attached  to  the  top 


of  the  new  sm  (a  node  adjoined  two  sm  arcs  from  the  top  of  the  prior  sm). 

We  thus  see  that  accounting  for  assignment  during  weak  interpretation  raises 
no  fundamental  difficulties.  The  simple  technical  device  of  keeping  track  of 
locations,  which  after  all  is  ultimately  what  happens  during  an  actual  execution,  is 
sufficient  to  determine  what  control  paths  are  possible.  The  method  handles 
naturally  even  the  assignment  of  function  and  location  values,  capabilities  not 
allowed  in  many  languages.  The  price  is  extra  expense,  but  it  is  paid  only  when  the 
capability  is  actually  used. 

7.  Flow  of  Control 

The  purpose  of  this  section  is  to  show  how  flow  of  control  other  than  function 
call  and  return  fits  into  the  process  of  graph  construction  by  weak  interpretation. 
In  all  cases,  the  flow  of  control  will  reflect  the  semantics  of  some  built  in  function. 
The  main  point  is  to  show  that  flow  of  control  can  be  handled  with  very  little 
mechanism  over  that  already  presented,  not  to  give  the  most  direct  translation  of 
the  standard  constructs. 

We  first  consider  how  a  simple  if-then-else  fits  into  this  scheme.  As  is 
customary,  we  will  assume  that  only  one  of  the  alternatives  is  evaluated  depending 
upon  the  condition,  to  order  to  model  this  in  our  language,  we  posit  a  constant 
(built-in)  function  IF  of  three  arguments,  all  evaluated.  Either  the  second  or  third 
argument  is  the  result,  depending  upon  the  first.  Thus,  the  construct  "if  termi 
then  term2  else  term 3"  would  be  represented  as  (IF  (termi,  term2>  term3>X),  and 
the  graph  structure  would  be: 

IF  G( termi)  #2  #3  CALL  3,  CALL  0, 

To  give  a  weak  interpretation  for  the  IF  function  is  to  describe  its  effect  on  a 
stack  model  and  on  the  graph  structure.  Since  the  result  can  be  either  that  of  the 
top  of  the  stack  or  next  to  the  top  of  the  stack,  we  obtain  the  function  set  for  the 


17 


top  entry  of  the  new  sm  by  taking  the  union  of  the  function  sets  for  the  two  entries 
on  the  top  of  the  stack.  Traverse  four  sm  arcs  from  the  top  node  (three  arguments 
plus  the  function),  adjoin  a  new  sm  node  to  this  point,  and  attach  the  new  function 
set.  lhen  construct  an  arc  from  the  CALL  3  node  to  a  new  flowgraph  node,  and  on 
to  the  new  CALL  0  node.  The  pair  consisting  of  the  new  sm  and  new  flowgraph 
node  is  the  result  of  interpreting  the  constant  function  IF. 

We  take  a  similar  approach  to  looping.  We  map  the  construct  "repeat  term 
end"  to  REPEAT  (X.term),  where  REPEAT  is  a  constant,  i.e.,  built-in,  function.  The 
graph  becomes: 

REPEAT  #  i  CALL  1, 

the  semantics  of  the  REPEAT  operator  is  to  repeatedly  call  its  argument  and  throw 
away  the  result.  Thus,  we  connect  the  CALL  node  to  the  following  graph 
construct. 

COPY  CALL  0,  POP 

Here,  COPY  means  to  push  the  top  of  the  stack  onto  the  stack,  and  POP  means  to 
simply  remove  the  top  of  the  stack.  The  effect  of  these  pseudo-ops  on  a  stack 
model  may  be  supplied  by  the  reader. 

In  the  REPEAT  example,  note  that  the  flowgraph  node  that  is  the  second 
argument  to  CALL  1,  is  never  reached.  This  naturally  raises  the  question,  what 
about  loops  that  do  terminate?  We  answer  this  question  by  asking  a  different  one: 
how  can  we  model  escape-like  constructs?  Our  answer  is  to  give  the  semantics  and 
weak  interpretation  for  a  very  general  version  of  escape.  We  shall  make  an  escape 
point  a  "first  class  object",  just  like  a  function  or  location.  In  fact  a  return  point 
is  exactly  the  location  of  a  return  address.  While  return  points  may  be  passed  as 
arguments  (or  assigned),  their  ultimate  destination  is  the  first  argument  of  the 
constant  function  ESCAPE.  The  second  argument  of  this  function  is  the  value  to 


be  returned  as  the  value  of  the  function  being  escaped.  The  effect  is  that 
everything  in  the  stack  between  the  top  and  the  static  chain  link  just  above  the 
return  point  (exclusive),  is  removed,  and  the  next  step  is  the  same  as  that  of  the 
pseudo-op  EXIT. 

Hie  mechanism  for  location  sets  discussed  in  the  last  section  is  used  without 
change  when  weakly  interpreting  ESCAPE,  which  is  handled  as  one  would  expect. 
The  top  node  of  the  prior  sm  corresponds  to  the  value  being  returned  on  behalf  of 
the  abstraction.  The  second  sm  node  will  have  a  non-null  location  set  and  each 
location  will  be  that  of  a  return  address  (if  either  condition  is  not  true,  there  is  an 
error).  Although  it  is  not  necessary,  it  is  convenient  to  have  all  escapes  from  an 
abstraction  collect  at  the  exit  node  for  the  abstraction.  To  aid  in  this,  it  is  helpful 
to  have  an  index  of  the  abstraction  in  the  top  node  of  the  sm  attached  to  its  entry 
node  (the  return  address),  and  a  way  to  get  from  abstraction  index  to  exit  node. 

For  each  location  in  the  location  set  attached  to  the  second  node  of  the  prior 
sm,  locate  the  EXIT  node  for  the  corresponding  abstraction,  and  establish  a 
flowgraph  arc  from  the  CALL  node  invoking  the  ESCAPE  to  the  EXIT  node.  Copy 
the  top  node  of  the  prior  sm,  but  make  it  point  to  the  static-chain  just  above  the 
escape  location  (via  the  abstraction  index).  The  set  of  pairs,  consisting  of  smfc 
obtained  in  this  way  and  the  corresponding  EXIT  node  of  the  abstraction,  is  the 
result  of  the  weak  interpretation  of  the  constant  function  ESCAPE. 

We  now  return  to  the  issue  of  looping  constructs  with  exits.  For  example, 
consider  the  standard  "while  ter  mi  do  term2<  This  can  be  viewed  in  our  language 
as 

WHILE  (  termi,  term2) 

But  WHILE  does  not  need  to  be  built  in  because  it  can  be  defined  as  follows: 
condition,  body,  repeat  [f  condition  0  then  exit  function  else  body  0  endif 
end repeat 


19 


The  repeat  and  if-then-else  constructs  have  been  discussed;  the  end  function  maps 
directly  to  an  ESCAPE  with  suitable  argument. 

In  summary,  the  flow  of  control  constructs  found  in  most  languages  can  be 
included  in  this  general  scheme  with  little  difficulty.  Hie  'If-then -else"  constructs 
require  only  a  new  constant  function  and  new  pseudo-ops.  The  "repeat"  construct 
requires  a  constant  function  and  two  simple  pseudo-ops  for  manipulating  the  top  of 
the  stack.  Hie  escape  mechanism  requires  an  extra  abstraction  in  the  stack  model 
and  a  table  of  exit  nodes,  as  well  as  a  constant  function  (and  no  new  pseudo-ops), 
but  it  provides  a  powerful  capability.  Together  with  ordinary  function  cell  and 
procedure  parameters,  other  control  constructs  are  easily  described. 

Bibliography 

1.  Wegbreit,  Ben.  "Property  Extraction  in  Well-founded  Property  Sets",  Center 
for  Research  in  Computing  Technology,  Harvard  University,  Cambridge, 
Massachusetts  and  Computer  Science  Division,  Bolt,  Beranek  and  Newman,  Inc., 
Cambridge,  Massachusetts,  February,  1973. 

2.  Kam,  J.B.  and  Ullman,  J.D.  "Monotone  data  flow  analysis  frameworks"  Acta 
Informatica  7:3,  1977.  pp  305-318. 


The  RULOG  Inferencing  Engine 

Thomas  E.  Cheatham,  Jr. 

Harvard  University 
and 

Software  Options,  Inc. 

Cambridge,  MA  02138 

February  14,  1985 

1  Introduction 

The  development  of  sophisticated  expert  systems  depends  upon  obtaining 
the  knowledge  of  experts  and  developing  efficient  means  for  representing 
that  knowledge  and  drawing  inferences  from  it.  Some  of  the  more  suc¬ 
cessful  expert  systems  approach  the  efficiency  issue  by  computationally 
tractable  models  specific  to  very  narrow  domains  with  specializations  of 
general  purpose  problemrsolving  methods  applicable  to  a  wide  variety  of 
problem  domains.  The  general  purpose  component  used  in  many  expert 
systems  is  rule-based,  with  production  rules  representing  knowledge  and 
the  associated  mechanisms  for  drawing  inferences. 

In  other  systems  it  has  seemed  very  natural  to  use  the  predicate  calcu¬ 
lus  to  represent  knowledge.  However,  until  recently,  the  inferencing  algo¬ 
rithms  for  pure  predicate  calculus  were  too  inefficient  for  this  approach  to 
be  a  practical  alternative  to  the  less  formal  rule-based  approach.  Addition¬ 
ally,  if  predicate  calculus  knowledge  inferencing  is  done  using  the  resolution 
method  of  theorem  proving,  it  is  difficult  to  explain  concilia  ions  or  even  in¬ 
ferences  in  terms  that  make  sense  to  a  user.  Recently,  the  emergence  of 
logic  programming  in  general  and  PROLOG  in  particular  suggests  that 


there  may  practical  realizations  of  inferencing  engines  for  predicate  calcu¬ 
lus.  The  theory  of  logic  programming  imposes  a  syntactic  restriction  on 
formulas  in  the  predicate  calculus,  limiting  the  formulas  to  what  are  called 
definite  clauses.  The  resulting  language  has  proved  adequate  for  a  wide 
range  of  applications,  and  there  are  well  understood  techniques  for  con¬ 
structing  an  interpreter  (that  is,  an  inferencing  engine)  for  definite  clauses 
that  is  reasonably  efficient.  Also,  with  logic  programming  the  usual  repre¬ 
sentation  of  the  proof  of  some  predicate  is  a  “proof  tree”  that  provides  a 
very  natural  framework  for  explaining  why  a  particular  proof  was  successful 
as  well  as  exploring  why  some  attempted  step  in  a  proof  was  not  successful. 

Because  logic  programs  can  be  given  a  precise  semantics,  they  are 
amenable  to  theoretical  analysis.  It  is  even  possible  to  attach  uncertainties 
to  the  rules  in  logic  programs  while  retaining  precise  semantics  [Shapiro 
83]. 

PROLOG  is  presently  the  best  known  and  most  widely  available  lan¬ 
guage  for  logic  programming  and  PROLOG  implementations  exist  on  sys¬ 
tems  ranging  from  mainframe  to  personal  computers.  The  Japanese  fifth 
generation  project  has,  of  course,  made  a  strong  commitment  to  PROLOG 
as  the  basis  for  expert  systems  of  the  future  [Feigenbaum  83],  and  a  number 
of  groups  are  investigating  specialized  architectures  for  PROLOG  machines. 

Yet  no  matter  how  capacious  or  fast  the  underlying  hardware,  we  cannot 
ignore  the  importance  of  the  efficiency  of  the  software.  Our  interest  is  in 
extensions  to  the  inferencing  technology  -  to  the  software  technology  -  to 
improve  the  performance  of  inferencing  engines. 

There  are  several  factors  that  will  improve  the  inferencing  capabilities  of 
interpreters  for  logic  programs.  One  improvement  is  to  incorporate  special- 
purpose  inferencing  components  for  specific  domains.  For  example,  existing 
interpreters  can  infer,  from  the  facts  x  <  y  and  y  <  x,  that  x  and  y  are 
mutually  inconsistent  only  when  x  and  y  have  constant  values  that  can 
be  compared.  To  handle  the  general  case  would  require  axiomatizing  the 
less-than  predicate;  such  an  approach  is  impractical.  There  are,  however, 
efficient  satisfiability  procedures  for  systems  of  linear  inequalities  [Nelson 
81].  Another  example  is  sets  of  equalities  and  (Inequalities.  Again,  to 
handle  general  equalities  and  disequalities  with  present-day  interpreters, 
we  must  axiomatize  these  concepts,  inducing  computationally  infeasible 


2 


iterpretations;  and,  again,  there  do  exist  efficient  decision  procedures  for 
handling  equalities  and  disequalities  [Nelson  79]. 

Another  major  limitation  of  the  current  generation  of  logic  program 
interpreters  is  in  their  ability  to  deal  with  large  numbers  of  ground  facts. 
The  difficulty  is  not  just  in  being  able  to  store  large  numbers  of  facts  but, 
rather,  in  managing  them,  ensuring  that  they  are  up  to  date  and  consistent. 

A  third  limitation  of  current  logic  program  interpreters  is  that  they  lack 
a  type  system.  The  same  argument  can  be  made  for  logic  programs  that 
is  made  for  other  kinds  of  programs  -  that  incorporating  a  type  system 
becomes  essential  when  the  knowledgebase  becomes  sufficiently  large. 

We  have  recently  developed  an  interpreter  for  a  language  based  on  defi¬ 
nite  clauses  that  has  the  power  and  generality  of  current  logic  programming 
interpreters  but  does  not  suffer  the  limitations  cited  above.  Aspects  of  this 
language  and  its  interpreter  will  be  described  in  the  sections  following.  In 
order  to  distinguish  our  rule  language  we  call  it  RULOG  (for  RULes  in 
LOGic).  The  term  RULOG  is  also  intended  to  suggest  that  our  intended 
application  is  knowledge  representation  and  inferencing,  and  not  program¬ 
ming  as  is  the  case  with  PROLOG. 

2  The  RULOG  Type  System 

It  is,  of  course,  possible  to  simulate  a  type  system  within  PROLOG;  [Mishra 
]  describes  one  way  of  imposing  types.  However  we  have  chosen  to  include 
a  type  system  imbedded  directly  within  the  RULOG  language.  The  type 
system  we  have  chosen  for  RULOG  is  modeled  on  in  the  Ada  type  system  for 
scalars  extended  to  permit  the  definition  of  functional  types.  For  example 
the  following  are  type  definitions  in  RULOG. 

•  type  COLOR  is  (RED,  BLUE,  YELLOW,  GREEN,  MAUVE) 

•  type  STOP-LIGHT  is  (RED,  YELLOW,  GREEN) 

•  type  state  is  (odd,  even) 

•  type  small  is  new  INTEGER  range  1..100 

•  subtype  little  is  small  range  2..  10 


•  type  arith  is  function  (INTEGER,  INTEGER)  return  INTEGER 

•  type  rel  is  function  (INTEGER,  INTEGER)  return  BOOLEAN 

The  type  COLOR  is  an  enumeration  type  with  five  distinct  values 
named  by  the  literals  RED,  ...  ,  MAUVE.  As  with  Ada,  enumeration 
literals  may  be  “overloaded”  (as  are  RED,  YELLOW,  and  GREEN  above) 
and  the  type  ambiguity  is  resolved  by  conversion,  as  in 

...COLOR(RED)... 

or  by  context,  as  in 

subtype  NOGO  is  STOP-LIGHT  range  RED...YELLOW 

RULOG  goes  beyond  Ada  in  permitting  the  definition  of  functional 
types;  the  type  arith  above  is  that  of  a  function  taking  two  INTEGER 
arguments  and  returning  an  INTEGER  result.  A  predicate  is  a  function 
returning  the  (built-in)  type  BOOLEAN. 

Subtypes  are,  as  in  Ada,  constraints  on  an  undo-lying  type.  Thus, 
referring  to  the  above  examples,  an  object  with  subtype  little  has  type 
small  and  is  constrained  to  have  a  value  between  2  and  10. 

3  The  RULOG  Language 

We  can  think  of  RULOG  as  having  three  kinds  of  statements:  definitions, 
assertions,  and  dialogue. 

Definitions 

Several  sorts  of  things  can  be  defined  in  RULOG: 


Commencing  with  the  built-in  types  INTEGER,  STRING,  and  BOOLEAN, 
one  can  define  (name)  new  types  as  enumerations  and  as  derived  from  an 
existing  type,  possibly  including  a  constraint  on  the  parent  type.  Subtypes 
are  defined  as  constraints  on  some  existing  type.  The  identifiers  used  to 
name  types  and  subtypes  may  not  be  used  for  any  other  purpose. 

literals 

A  literal  in  RULOG  is  like  a  variable  in  a  programming  language  (the  term 
variable,  however,  being  reserved  in  RULOG  to  mean  a  quantified  variable 
in  some  rule).  An  example  of  a  literal  definition  is 

let  S:  state  initially  odd 

that  defines  S  to  be  a  literal  with  type  state  and  initial  value  odd. 

tokens 

A  token  names  a  scalar,  function,  or  predicate  whose  value  is  entirely  de¬ 
termined  by  subsequent  assertions.  Some  example  are 

•  token  x  is  a  INTEGER 

•  token  s  is  a  function(INTEGER)  returns  INTEGER 

•  token  p  isa  function(INTEGER,  INTEGER,)  returns  BOOLEAN 

defining  x  as  an  INTEGER,  s  as  a  function  on  INTEGERS  to  INTEGERS 
and  p  a  predicate  on  pairs  of  INTEGERS. 


database  predicates 

A  database  predicate  defines  a  mapping  between  predicate  symbols  in  RU- 
LOG  assertions  and  tuples  in  a  relational  database.  As  an  example 

define  has_age(name:STRING,  age:  small) 
on  employees  of  DB 

introduces  the  predicate  symbols  “has_age”.  This  predicate  is  to  be  taken  as 
true  for  each  of  the  two  element  subtuples  of  each  tuple  of  the  “employees” 
relation  of  the  database  named  “DB”  consisting  of  the  value  of  the  attribute 
named  “name”  and  that  of  the  attrribute  named  “age” .  The  types  of  the 
name  and  age  components  are,  within  RULOG,  understood  to  be  STRING 
and  small,  respectively.  The  transactions  betwen  RULOG  and  the  database 
access  mechanisms  will  be  discussed  below. 


Assertions 

Once  we  have  defined  a  sufficient  collection  of  types,  tokens,  function  sym¬ 
bols,  and  predicate  symbols  we  can  make  assertions  to.  RULOG.  Assertions 
take  two  basic  forms:  assertions  about  uninterpreted  predicates  and  as¬ 
sertions  about  tokens.  The  assertions  about  uninterpreted  predicates  are, 
semantically,  similar  to  the  rules  in  PROLOG;  an  example  is 

assert  forall(e:STRING,  a:small)  young(e)  if 
has.ags(e,a)  and  a  <  25 

stating  that  a  “young”  employee  is  one  whose  age  does  not  exceet  25.  In 
general  a  rule  has  a  forall(...)  prefix  that  names  the  quantified  variables 
and  gives  their  types,  followed  by  a  conclusion  and  a  conjunction  of  pre¬ 
misses.  As  with  PROLOG,  the  rule  asserts  that  the  conclusion  can  be 
established  by  proving  that  each  of  the  premisses  is  true.  Each  premiss  is 
a  predicate  applied  to  zero  or  more  terms;  terms  includes  literals,  tokens 
naming  scalars,  quantified  variables,  and  tokens  naming  functions  that  are 
applied  to  terms,  recursively.  The  major  differences  from  ordinary  logic 
programming  rules  are  that  the  quantified  variables  are  typed,  that  the 


■  .vs 7. 


interpretation  of  various  of  the  built-in  predicates  like  =,^,  and  <  is  han¬ 
dled  somewhat  differently,  and  that  certain  of  the  predicates  -  the  database 
predicates  -  require  communication  with  a  database  system  in  order  to  be 
established.  We  will  discuss  the  interpretation  of  the  built-in  predicates 
and  database  predicates  below. 

The  assertions  about  tokens  involve  the  equality,  disequality,  and  in¬ 
equality  predicates.  An  example,  luwmning  that  x,  y,  and  z  had  been  de¬ 
fined  as  INTEGER  valued  tokens,  is: 

assert  x  <  y 
assert  y  <  z 
assert  z  <  x 

Subsequent  to  receiving  these  assertions  RULOG  would  know  that  whatever 
value  x,  y,  and  z  have  they  all  have  the  same  values  and,  for  example,  it 
would  recognize  that  the  assertion 

assert  x^y 

was  not  valid  (technically,  it  is  unsatisfiable). 

Dialogue 


One  kind  of  client  for  RULOG  will  be  some  process  that  wishes  to  determine 
whether  or  not  some  predicate  can  be  proved;  an  example  of  a  request  for 
a  proof  would  be 

prove  exists(E:STRING)  young  (E) 

that  would,  presumably,  scan  the  employee  database  to  find  an  employee 
who  is  25  or  younger. 

Another  kind  of  client  is  a  knowledge  engineer  who  is  interested  in 
exploring  or  debugging  some  set  of  rules.  RULOG  provides  a  dialogue 
interface  for  this  client,  offering  a  set  of  commands  that  permit  the  user 
to  stop  the  interpreter  at  various  points  in  attempting  a  proof;  to  examine 
the  values  of  variables,  literals,  and  so  on;  to  request  explanations  of  why  a 
particular  predicate  was  determined  to  be  true  or  why  it  gailed  to  be  true; 
and  so  on.  We  refer  the  interested  reader  to  the  RULOG  user  manual  (see 
[Cheatham  ]  )  for  further  discussion  of  the  dialogue  facilities. 


4  The  RULOG  Interpreter 

The  RULOG  interpreter  has  two  major  components  -  the  reader  and  the 
prover.  The  reader  accepts  definitions,  assertions,  and  dialogue  comands, 
performs  appropriate  syntactic  and  semantic  checks  and  creates  an  internal 
representation  for  items  defined.  Thus,  the  reader  is  analogous  to  the  syn¬ 
tactic  and  semantic  analysis  components  of  an  Ada  compiler.  The  source 
of  input  is  usually  one  or  more  files  but  input  can  be  typed  in  directly  as 
well.  Syntactically  invalid  statements  are  rejected  and  an  appropriate  com¬ 
ment  on  the  problem  encountered  is  given;  a  statement  can  be  re-submitted 
after  editing  if  the  user  is  typing  directly.  The  semantic  analysis  resolves 
the  types  of  overloaded  literals  and  ensures  that  the  type  of  each  construct 
is  consistent  with  that  required;  semantic  errors  also  result  in  the  rejec¬ 
tion  of  the  statement  input  accompanied  by  appropriate  comments  on  the 
problems  encountered. 

The  prover  is  invoked  by  being  given  some  theorem  to  be  established. 
In  a  fashion  analogous  to  most  logic  program  interpreters,  it  attempts  to 
build  a  proof  tree  in  order  to  establish  that  the  theorem  is  true.  The  proof 
tree  has  as  its  root  the  predicate  to  be  established.  In  general,  at  any  point 
in  the  proof,  the  prover  is  working  on  some  node  of  the  proof  tree  in  an 
attempt  to  establish  that  the  predicate  at  that  node  is  true.  The  operation 
of  the  prover  at  a  given  node  is  dependent  upon  the  sort  of  predicate  at 
the  node;  for  discussion  purposes  we  classify  the  predicates  at  a  node  into 
three  groups,  as  follows: 

Uninterpreted  Predicates 

An  uninterpreted  predicate  is  a  predicate  whose  truth  is  established  by 
appealing  to  the  assertions  that  have  been  made  about  that  predicate  (that 
is,  the  rules  whose  conclusion  involves  the  same  predicate  symbol  as  that 
of  the  predicate  we  are  trying  to  establish).  The  processing  of  a  node 
with  an  uninterpreted  predicate  is  analogous  to  the  processing  done  by  a 
PROLOG  interpreter.  That  is,  to  satisfy  such  a  predicate  we  must  find 
some  rule  whose  conclusion  has  the  same  predicate  symbol  and  such  that 
we  can  unify  each  term  of  the  predicate  at  the  node  with  the  corresponding 


8 


term  of  the  rule  conclusion.  If  successful,  the  premisses  of  the  rule  are 
established  as  the  descendants  of  the  current  node  and  we  turn  to  the  next 
node  of  the  proof  tree;  if  not  successful,  the  p rover  must  backtrack  and 
attempt  to  prove  the  predicates  at  some  previous  node  in  a  different  way  ( 
that  is,  a  different  rule  or  a  different  fact  from  the  database). 

Database  Predicates 

The  truth  of  a  database  predicate  is  established  by  appeal  to  the  appropri¬ 
ate  relation  in  the  relational  database.  At  the  present  time  we  are  using 
the  TROLL  database  system  but  think  that  the  modifications  required  to 
use  some  other  database  system  would  be  minor.  The  transactions  required 
between  RULOG  and  the  database  system  depend  both  on  the  predicate 
to  be  established  and  on  the  arguments  to  that  predicate.  We  consider  a 
couple  of  examples  using  the  has_age  predicate  cited  earlier.  Suppose  that 
we  wanted  to  establish  the  truth  of 

has^age(e,a) 

and  that,  at  the  time  we  wished  to  establish  this,  e  was  bound  to  “Henry” 
and  a  to  35.  The  transaction  required  is  a  query  to  the  database  that 
will  determine  and  report  whether  there  is  a  tuple  of  the  employee  relation 
whose  name  and  age  fields  are  ‘Henry’  and  35,  respectively.  If,  after  a 
subsequent  failure  to  prove  some  predicate,  we  backtrack  to  this  node  there 
is  no  other  way  to  prove  it  and  backtracking  will  have  to  continue  on  to 
retry  nodes  previous  to  this  one. 

If,  for  the  same  example, 

has.age(e,a) 

both  e  and  a  were  unbound  quantified  variables  at  the  time  we  wished  to 
establish  that  the  predicate  was  true,  a  rather  different  transaction  with  the 
database  would  be  required.  This  time  we  would  request  the  values  of  the 
name  and  age  attributes  of  the  first  tuple  of  the  employee  relation  and  bind 
the  variables  e  and  a  to  these  values  to  establish  the  truth  of  the  predicate. 
Upon  backtracking  to  the  node  with  this  predicate,  the  transaction  required 
is  to  retrieve  the  values  of  the  name  and  age  attributes  for  the  next  tuple  of 


the  employee  relation,  continuing,  during  subsequent  backtracking  to  the 
node,  until  the  tuples  were  for  the  employee  relation  were  exhausted  before 
backtracking  further. 

Interpreted  Predicates 

Analogous  to  PROLOG,  RULOG  has  a  number  of  interpreted  predicates 
including  cut,  fail  and  the  like  to  control  a  proof.  Like  PROLOG,  RULOG 
also  provides  the  equality  (=),  disequality(^)  and  inequality (<)  predicates 
but  has  a  different  interpretation  of  them  that  we  will  discuss  below.  RU¬ 
LOG  also  provides  for  assignment  of  new  values  to  literals.  An  example  is 
the  ‘‘predicate” 

S  :=  even 

that  assigns  the  value  even  to  the  literal  S  and  returns  "true”;  this  ability 
to  modify  certain  variables  during  a  proof  makes  reasoning  about  situations 
that  involve  some  notion  of  "state”  rather  more  perspicuous  than  is  often 
the  case  with  PROLOG. 

RULOG’s  treatment  of  =,  and  <  are  different  from  PROLOG’S,  hi 
PROLOG  x  =  y  is  true  if  x  and  y  are  manifestly  equal  or  one  of  them  is  a 
variable  that  can  be  bound  to  the  other;  x  /  y  is  the  failure  to  show  x  = 
y  ;  and,  x  <  y  is  valid  only  if  both  s  and  y  are  integers  (including  variables 
bound  to  integers  and  arithmetic  operations  on  integers)  with  the  obvious 
interpretation. 

By  contrast,  the  RULOG  meaning  of  these  predicates  is  provably  equal, 
disequal,  or  inequal.  We  can  think  of  the  prover,  when  it  encounters  one 
of  thse  predicates,  as  appealing  to  a  specialist  who  determines  whether  the 
given  instance  of  the  predicate  is  true  or  not  and  reports  back  accordingly. 
The  specialists  provided  in  RULOG  are,  as  we  noted  earlier,  based  on  the 
satisfiability  procedures  developed  by  Nelson  (see  [Nelson  81])  and  incor¬ 
porated  in  the  Stanford  Program  Verifier.  We  term  these  specialists  E  and 
R.  E  maintains  a  conjunction  of  equality  and  disequality  facats  that  have 
been  asserted;  the  equalities  are,  in  general,  over  terms  constructed  from 
tokens,  literals,  and  uninterpreted  functions  applied  to  terms,  recursively. 
E  partitions  the  terms  into  equivalence  classes  and  propagates  each  new 


equality  asserted  so  that,  for  example,  the  assertion  x  =  y  will  cause  x  and 
y  to  be  placed  in  the  same  equivalence  class;  E  also  propagates  the  equality 
so  that,  for  example,  f(x)  and  f(y)  will  be  placed  in  the  same  equivalence 
class.  Disequalities  are  managed  by  associating  with  each  equivalence  class 
the  list  of  terms  that  are  disequal  to  it;  an  attempt  to  add  such  a  term  to 
the  equivalence  class  that  it  is  forbidden  to  inhabit  will  result  in  unsatifia- 
bility.  To  prove  that  some  equality  or  disequality  is  true,  we  demonstrate 
that  conjoining  its  converse  to  E  is  unsatisfiable.  E  employes  some  fairly 
elaborate  data  structures  and  carefully  chosen  algorithms  that  achieve  a 
time  cost  that  is  of  the  order  of  n  log  n  to  add  an  n-th  conjuction  to  n-1 
already  known  to  E. 

The  R  specialist  maintains  a  conjunction  of  inequalities  that  have  been 
asserted.  It  converts  each  inequality  to  an  equality  by  introducing  a  so- 
called  restricted  variable;  restricted  variables  are  constrained  to  be  non¬ 
negative  and  R  insures  that  these  constraints  are  met.  The  addition  of  a 
new  inequality  may  result  in  the  discovery  of  one  or  more  equalities  that  are 
implied  by  the  new  inquality  (as,  for  example,  *<x  added  to  x<y  and  y<z 
would  result  in  x=y  and  y=*  being  discovered);  all  equalities  discovered 
are  reported  to  E.  The  addition  of  a  new  inequality  might  also  result  in 
a  restricted  variable  being  negative,  in  which  case  the  set  of  inequalities 
submitted  to  R  is  unsatisfiable.  To  prove  that,  for  example,  x>y,  R  shows 
that  the  addition  of  the  convene,  x<y,  to  the  set  of  inequalities  it  currently 
has  results  in  unsatisfiability. 

The  R  specialist  is  also  used  to  insure  that  the  range  contraints  associ¬ 
ated  with  types  and  subtypes  are  not  violated.  Suppose  we  have 

type  small  is  new  INTEGER  range  1..100 


assert  forall(...,  Vcsmall)  p(...V...)  if  ... 

Whenever  the  cited  rule  for  p  is  involved  in  a  proof,  we  must  insure  that 
I<V  and  V<100.  This  is  handled  by  submitting  the  two  inequalities  to  R, 
and,  if  any  subsequent  assertions  about  V  contradicts  the  constraint,  R  will 
report  out  the  unsatisfiability;  this  is  interpreted  as  a  failure  in  the  proof 
and  initiates  backtracking. 


Future  Directions 


We  think  of  the  present  RULOG  as  a  prototype  in  which  have  demonstrated 
the  feasibility  of  combining  the  basic  mechansism  of  PROLOG  with  a  type 
system,  a  set  of  specialists  that  are  very  efficient  at  determining  the  satisfutr 
bility  of  predicates  over  restricted  domains,  and  a  connection  to  a  database 
system  to  provide  a  source  of  ground  facts.  In  addition,  RULOG  provides  a 
reasonable  user  interface  and  facilities  for  explaining  why  a  proof  succeeded 
or  failed. 

There  are  a  number  of  additions  to  RULOG  that  we  intend  to  investigate 
before  we  do  the  final  round  of  engineering  to  insure  that  it  is  a  reliable 
and  robust  system  appropriate  for  distribution. 

At  present,  RULOG  has  no  mechanism  for  dealing  with  arbitrary  col¬ 
lections  of  objects.  We  have  rejected  the  idea  of  incorporating  lists  in  the 
way  that  PROLOG  does  to  provide  for  dealing  with  collections.  Instead, 
we  are  exploring  the  possibility  of  adding  sets  to  the  language,  complete 
with  the  usual  set  operations,  set  construct,  set  iterators,  and  the  like.  The 
experience  with  SETL  (see  [Schwartz  74  ])  and  work  by  Sandhu  see  [Sandhu 
81]  suggest  that  the  use  of  sets  and  set  notations  might  be  a  very  natural 
and  user-friendly  way  to  deal  with  arbitrary  collections. 

Another  addition  that  is  required  for  many  applications  is  some  notion 
of  certainty  (or,  equivalently,  fuzzy  predicates);  [Shapiro  83]  disucsses  how 
this  might  be  added  to  PROLOG  and  we  believe  a  similar  addition  to  be 
possible  to  RULOG. 

At  present,  the  connection  between  RULOG  and  the  database  system 
(TROLL)  is  rather  loose  -  RULOG  runs  on  an  Apollo  and  the  database  sys¬ 
tem  on  a  VAX.  We  intend  to  investigate  both  connections  to  other  databases 
and  a  tighter  coupling  of  the  RULOG  and  TROLL  processes  (possibly  even 
combining  them  into  a  single  process  on  one  computer). 

The  versions  of  E  and  R  currently  operational  in  RULOG  are  satisfiabil¬ 
ity  procedures,  not  decision  procedures  (that  is,  they  do  not  bind  quantified 
variables).  We  believe  it  straightforward  to  make  E  into  a  decision  proce¬ 
dure  and  are  exploring  various  ways  to  extend  R  to  be  able  to  do  variable 
binding  as  suggested  in  [Town ley  80]. 


REFERENCES 


[Cheatham]  Cheatham,  T.  E.,  The  RULOG  User  Manual.  In  preparation. 

[Feigenbaum  83]  Feigenbaum,  E.  A.,  and  McCorduck,  P.  The  Fifth 
Generation.  Addison- Wesley,  1983. 

[Mishra]  Mishra,  P.,  Polymorphic  type  inference  in  PROLOG.  Extended 
summary.  CS  Department.  University  of  Utah,  Salt  Lake  City,  Utah. 

[Nelson  79]  Nelson,  G.,  and  Oppen,  D.,  Simplification  by  Cooperating 
Decision  Procedures.  ACM  Trans,  on  Programming  Languages  and 
Systems ,  1,  2  (October  1979),  245-257. 

[Nelson  81]  Nelson,  G.,  Techniques  for  Program  Verification.  CSL-81-10, 
Xerox  Palo  Alto  Research  Center,  June,  1981. 

[Sandhu  81]  Sandhu,  R.  S.,  The  Case  for  a  SETL  Based  Query  Language, 
LCSR  TR-24,  Rutgers  University,  1981. 

[Schwartz  74]  Schwartz,  J.  T.,  On  Programming:  An  interim  report  on  the 
SETL  Project.  Installments  I  and  II.  CIMS,  New  York  University,  1974. 

[Shapiro  83]  Shapiro,  E.,  Logic  Programs  with  Uncertainties  -  a  Tool  for 
Implementing  Rule  Based  Systems,  IJCIA  1 ,  (1983),  529-532. 

[Townley  80]  Townley,  J.  A.,  A  Pragmatic  Approach  to  Resolution-based 
Theorem  Proving.  Int.  Jr.  on  Computer  and  Information  Sciences  9,  2, 
(1980),  93-116. 


END 

FILMED 

4-85 


DTIC 


